Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id E8322200D2F for ; Wed, 18 Oct 2017 06:46:24 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id E6B171609EC; Wed, 18 Oct 2017 04:46:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B55F01609EB for ; Wed, 18 Oct 2017 06:46:23 +0200 (CEST) Received: (qmail 15137 invoked by uid 500); 18 Oct 2017 04:46:17 -0000 Mailing-List: contact dev-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hbase.apache.org Delivered-To: mailing list dev@hbase.apache.org Received: (qmail 15123 invoked by uid 99); 18 Oct 2017 04:46:17 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 18 Oct 2017 04:46:17 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 212631A2B32 for ; Wed, 18 Oct 2017 04:46:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.48 X-Spam-Level: ** X-Spam-Status: No, score=2.48 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HEADER_FROM_DIFFERENT_DOMAINS=0.001, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id 6EdNpHprsnEY for ; Wed, 18 Oct 2017 04:46:11 +0000 (UTC) Received: from mail-wm0-f42.google.com (mail-wm0-f42.google.com [74.125.82.42]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 4C4385F4E5 for ; Wed, 18 Oct 2017 04:46:11 +0000 (UTC) Received: by mail-wm0-f42.google.com with SMTP id q124so7441123wmb.0 for ; Tue, 17 Oct 2017 21:46:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to; bh=KC7bEvOShu0VKaUp1gmDZ7ktEDjKbjiOSYFVxtnT8bs=; b=shaPrIBkiQrhiAWKEWETSonGiUNrGgiWG5WqUYGosx/lhekTjlfDI5/nw9PZU9uS6z ESPrALiIM+w9jyVp3r6R/WhBXQZmOOxOJT38jLPEr3EEwN3TbUFBea9YHxe0GKBgo7xW mifYM6vp4MVD1RsxyPei+zBXLRPG/Xsj/fOjH2lQNyNkX0VDpW+UH9HNcB07/pY6669L jdAabLfPWtaIaC6u0PBmjn9zH+RKyd3VkpyuDa2mZHHYVY3cJCI7JqaGk0BnqlX4NIIK ie/7+3hiBbsOnH3ghyCzCgLlLEgHhnLmi6RguEMpC8fE7ULM1R4939Tcot/zFku6pA8A Utuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:in-reply-to:references:from :date:message-id:subject:to; bh=KC7bEvOShu0VKaUp1gmDZ7ktEDjKbjiOSYFVxtnT8bs=; b=mWXatKjG3Rpzu3MaJ3VazRCUriYnFuMRyYToW5ua1Ikr+eODHLP1TMD+6CuPWXeoO+ ObPfU2Zv7ECi9dX1GFXA1EiZ9YnKdFlmLf2+H/XC5ruoqx37K2ItI1hSjCZSSeEbXdK1 DzKdIN0MqRRVnUSLVhInPsrj/6/cD+B0xkT0R/2IhRUzrX1pEVZVnHNwf+vSnZ0RqSsC GooDzze4657xgHkqkS89eIIo5lYQ5rUkEvizEF8YAo/gULeyTbohJCDnPFCjU0ykHqcG EmVSEDOYneRm9kOoDYFN+I4YZi2NtwNjXInnG8PBAlKPqUko9xe+oUW7foRWMxH9OMQ6 YXuA== X-Gm-Message-State: AMCzsaV2IViadATxq5DGA8ZDQCoro+L8OtvORk/2CXmg7ROGdYYtHZ0s ZIw1XkKAApDSCpELPJYDDC/QdGeyduTzD64JQGrt3uDF X-Google-Smtp-Source: ABhQp+ToKEfNc+6TOHfmFayhUFkkZjfeDNok+O6FfnJl2lMYlVF/VVDew4AIvb2ErgZIuTcqqcL1JcNBw5h1JfJPupA= X-Received: by 10.28.215.4 with SMTP id o4mr5037459wmg.0.1508301969738; Tue, 17 Oct 2017 21:46:09 -0700 (PDT) MIME-Version: 1.0 Sender: saint.ack@gmail.com Received: by 10.223.185.57 with HTTP; Tue, 17 Oct 2017 21:46:08 -0700 (PDT) In-Reply-To: References: <66d24c8e-784c-9712-123e-a911df26647a@apache.org> From: Stack Date: Tue, 17 Oct 2017 21:46:08 -0700 X-Google-Sender-Auth: ZsXYJz264fLTh5HAbfQZw5K2eO4 Message-ID: Subject: Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs To: HBase Dev List Content-Type: multipart/alternative; boundary="001a1147199e1ed07f055bcaeb2c" archived-at: Wed, 18 Oct 2017 04:46:25 -0000 --001a1147199e1ed07f055bcaeb2c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable I was going to pick up on the bypass after HBASE-19007 lands, cleaning up our exposure of Master/RegionServerServices to Coprocessors (HBASE-19007 was going bad for a good while but lots of contributors and good discussion and now I think we have it). Shouldn't be too much longer. Its CP API so I was figuring it an alpha-4 item. St.Ack On Tue, Oct 17, 2017 at 6:56 PM, =E5=BC=A0=E9=93=8E(Duo Zhang) wrote: > Fine. Let me change the title of HBASE-18770 and prepare a patch there. > > May still a week or two before alpha4 I think. The scan injection, and > flush/compaction trigger/track API is still unstable... > > 2017-10-18 6:12 GMT+08:00 Josh Elser : > > > (catching up here) > > > > I'm glad to see you fine folks came to a conclusion around a > reduced-scope > > solution (correct me if I'm wrong). "Some" bypass mechanism would stay > for > > preXXX methods, and we'd remove it for the other methods? What exactly > the > > "bypass API" would be is up in the air, correct? > > > > Duo -- maybe you could put the "current plan" on HBASE-18770 since > > discussion appears to have died down? > > > > I was originally lamenting yet another big, sweeping change to CPs when= I > > had expected alpha-4 to have already landed. But, let me play devil's > > advocate: is this something we still think is critical to do in alpha-4= ? > I > > can respect wanting to get address all of these smells, but I'd be worr= y > it > > delays us further. > > > > > > On 10/11/17 9:53 PM, =E5=BC=A0=E9=93=8E(Duo Zhang) wrote: > > > >> Creating an exception is expensive so if it is not suggested to do it > in a > >> normal case. A common trick is to create a global exception instance, > and > >> always throw it to avoid creating every time but I think it is more > >> friendly to just use a return value? > >> > >> And for me, the bypass after preXXX for normal region operations just > >> equals to a 'cancel', which is very clear and easy to understand, so I > >> think it is OK to add bypass support for them. And also for compaction > and > >> flush, it is OK to give CP users the ability to cancel the operation a= s > >> the > >> semantic is clear, although I'm not sure how CP users would use this > >> feature. > >> > >> In general, I think we can provide bypass/cancel support in preXXX > methods > >> where it is the very beginning of an operation. > >> > >> Thanks. > >> > >> 2017-10-12 3:10 GMT+08:00 Andrew Purtell : > >> > >> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to > use > >>>> > >>> its long encoding writing Increments. Not sure how we'd do that, > >>> selectively. > >>> > >>> If we can handle the rest of the trouble that you observed: > >>> > >>> 1) Lack of recognition and identification of when the key value to > >>> increment doesn't exist > >>> 2) Lack of the ability to set the timestamp of the updated key value. > >>> > >>> then they might be able to make it work. Perhaps a conversion from > HBase > >>> native to Phoenix LONG encoding when processing results, in the > wrapping > >>> scanner, informed by schema metadata. > >>> > >>> Or if we are keeping the bypass semantic in select places but > >>> implementing > >>> it with something other than today's bypass() API (please) this would > be > >>> another candidate for where to keep it. Duo suggests keeping the > semantic > >>> in all of the basic RPC preXXX hooks for query and mutation. We could > >>> redo > >>> those APIs to skip normal processing based on a return value or > exception > >>> but otherwise drop bypass from all the others. It will clean up areas > of > >>> confusion, e.g. can I bypass splits or flushes or not? Or what about > this > >>> arcane hook in compaction? Or [insert some deep hook here]? The answe= r > >>> would be: only RPC hooks will early out, and only if you return this > >>> value, > >>> or throw that exception. > >>> > >>> > >>> On Wed, Oct 11, 2017 at 11:56 AM, Stack wrote: > >>> > >>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass > when > >>>> user does a Get returning instead the result of its own (Flow) Scan > >>>> > >>> result. > >>> > >>>> Not sure how we'd do alternative here; Timeline Server is keeping Ta= gs > >>>> internally. > >>>> > >>>> > >>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell > > >>>> wrote: > >>>> > >>>> Rather than continue to support a weird bypass() which works in some > >>>>> > >>>> places > >>>> > >>>>> and not in others, perhaps we can substitute it with an exception? = So > >>>>> > >>>> if > >>> > >>>> the coprocessor throws this exception in the pre hook then where it = is > >>>>> allowed we catch it and do the right thing, and where it is not > allowed > >>>>> > >>>> we > >>>> > >>>>> don't catch it and the server aborts. This will at least improve th= e > >>>>> > >>>> silent > >>>> > >>>>> bypass() failure problem. I also don't like, in retrospect, that > >>>>> > >>>> calling > >>> > >>>> this environment method has magic side effects. Everyone understands > >>>>> > >>>> how > >>> > >>>> exceptions work, so it will be clearer. > >>>>> > >>>>> > >>>>> We could do that though throw and catch of exceptions would be > costly. > >>>> > >>>> What about the Duo suggestion? Purge bypass flag and replace it w/ > >>>> preXXX > >>>> in a few select methods returning a boolean on whether bypass? Would > >>>> that > >>>> work? (Would have to figure metrics still). > >>>> > >>>> > >>>> > >>>> In any case we should try to address the Tephra and Phoenix cases > >>>>> > >>>> brought > >>> > >>>> up in this discussion. They look like we can find alternatives. Shal= l > I > >>>>> file JIRAs to follow up? > >>>>> > >>>>> > >>>>> > >>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants = to > >>>> use > >>>> its long encoding writing Increments. Not sure how we'd do that, > >>>> selectively. > >>>> > >>>> St.Ack > >>>> > >>>> > >>>> > >>>> On Wed, Oct 11, 2017 at 6:00 AM, =E5=BC=A0=E9=93=8E(Duo Zhang) > > >>>>> wrote: > >>>>> > >>>>> These examples are great. > >>>>>> > >>>>>> And I think for normal region operations such as get, put, delete, > >>>>>> checkAndXXX, increment, it is OK to bypass the real operation afte= r > >>>>>> > >>>>> preXXX > >>>>> > >>>>>> as the semantic is clear enough. Instead of calling env.bypass, > maybe > >>>>>> > >>>>> just > >>>>> > >>>>>> let these preXXX methods return a boolean is enough to tell the > HBase > >>>>>> framework that we have already done the real operation so just giv= e > >>>>>> > >>>>> up > >>> > >>>> and > >>>>> > >>>>>> return? > >>>>>> > >>>>>> Thanks. > >>>>>> > >>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling : > >>>>>> > >>>>>> The Tephra TransactionProcessor CP makes use of bypass() in > >>>>>>> > >>>>>> preDelete() > >>>> > >>>>> to > >>>>>> > >>>>>>> override handling of delete tombstones in a transactional way: > >>>>>>> https://github.com/apache/incubator-tephra/blob/master/ > >>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/ > >>>>>>> > >>>>>> hbase/coprocessor/ > >>>>>> > >>>>>>> TransactionProcessor.java#L244 > >>>>>>> > >>>>>>> The CDAP IncrementHandler CP also makes use of bypass() in > >>>>>>> > >>>>>> preGetOp() > >>> > >>>> and > >>>>> > >>>>>> preIncrementAfterRRowLock() to provide a transaction implementatio= n > >>>>>>> > >>>>>> of > >>>> > >>>>> readless increments: > >>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase- > >>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/ > >>>>>>> hbase11/IncrementHandler.java#L121 > >>>>>>> > >>>>>>> What would be the alternate approach for these applications? In > >>>>>>> > >>>>>> both > >>> > >>>> cases > >>>>>> > >>>>>>> they need to impose their own semantics on the underlying KeyValu= e > >>>>>>> storage. Is there a different way this can be done? > >>>>>>> > >>>>>>> > >>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John >>>>>>> > >>>>>> > >>>> wrote: > >>>>>> > >>>>>>> > >>>>>>> Wrap core scanners is different right? That can be done in post > >>>>>>>> hooks. I have seen many use cases for this.. Its the question > >>>>>>>> > >>>>>>> abt > >>> > >>>> the pre hooks where we have not yet created the core object (like > >>>>>>>> scanner). The CP pre code itself doing the work of object > >>>>>>>> > >>>>>>> creation > >>> > >>>> and so the core code is been bypassed. Well the wrapping thing > >>>>>>>> > >>>>>>> can > >>>> > >>>>> be done in pre hook also. First create the core object by CP code > >>>>>>>> itself and then do the wrapped object and return.. I have seen i= n > >>>>>>>> > >>>>>>> one > >>>> > >>>>> jira issue where the usage was this way.. The wrapping can be > >>>>>>>> > >>>>>>> done > >>>> > >>>>> in post also in such cases I believe. > >>>>>>>> > >>>>>>>> -Anoop- > >>>>>>>> > >>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell < > >>>>>>>> > >>>>>>> apurtell@apache.org> > >>>>> > >>>>>> wrote: > >>>>>>>> > >>>>>>>>> I think we should continue to support overriding function by > >>>>>>>>> > >>>>>>>> object > >>>> > >>>>> inheritance. I didn't mention this and am not proposing more > >>>>>>>>> > >>>>>>>> than > >>> > >>>> removing > >>>>>>>> > >>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely > >>>>>>>>> > >>>>>>>> depends > >>> > >>>> on > >>>>> > >>>>>> being > >>>>>>>> > >>>>>>>>> able to wrap core scanners and return the wrappers. > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John < > >>>>>>>>> > >>>>>>>> anoop.hbase@gmail.com> > >>>>> > >>>>>> wrote: > >>>>>>>> > >>>>>>>>> > >>>>>>>>> When we say bypass the core code, it can be done today not > >>>>>>>>>> > >>>>>>>>> only > >>> > >>>> by > >>>> > >>>>> calling bypass but by returning a not null object for some of > >>>>>>>>>> > >>>>>>>>> the > >>>> > >>>>> pre > >>>>>> > >>>>>>> hooks. Like preScannerOpen() if it return a scanner object, > >>>>>>>>>> > >>>>>>>>> we > >>> > >>>> will > >>>>> > >>>>>> avoid the remaining core code execution for creation of the > >>>>>>>>>> scanner(s). So this proposal include this aspect also and > >>>>>>>>>> > >>>>>>>>> remove > >>>> > >>>>> any > >>>>>> > >>>>>>> possible way of bypassing the core code by the CP hook code > >>>>>>>>>> > >>>>>>>>> execution > >>>>>> > >>>>>>> ? Am +1. > >>>>>>>>>> > >>>>>>>>>> -Anoop- > >>>>>>>>>> > >>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell < > >>>>>>>>>> > >>>>>>>>> apurtell@apache.org > >>>>>> > >>>>>>> > >>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>>> The coprocessor API provides an environment method, > >>>>>>>>>>> > >>>>>>>>>> bypass(), > >>> > >>>> that > >>>>> > >>>>>> when > >>>>>>>> > >>>>>>>>> called from a preXXX hook will cause the core code to skip > >>>>>>>>>>> > >>>>>>>>>> all > >>> > >>>> remaining > >>>>>>>> > >>>>>>>>> processing. This capability was introduced on HBASE-3348. > >>>>>>>>>>> > >>>>>>>>>> Since > >>>> > >>>>> this > >>>>>> > >>>>>>> time I > >>>>>>>>>> > >>>>>>>>>>> think we are more enlightened about the complications of > >>>>>>>>>>> > >>>>>>>>>> this > >>> > >>>> feature. > >>>>>>> > >>>>>>>> (Or, > >>>>>>>>>> > >>>>>>>>>>> anyway, speaking for myself:) > >>>>>>>>>>> > >>>>>>>>>>> Not all hooks provide the bypass semantic. Where this is the > >>>>>>>>>>> > >>>>>>>>>> case > >>>>> > >>>>>> the > >>>>>>> > >>>>>>>> javadoc for the hook says so, but it can be missed. If you > >>>>>>>>>>> > >>>>>>>>>> call > >>>> > >>>>> bypass() > >>>>>>>> > >>>>>>>>> in > >>>>>>>>>> > >>>>>>>>>>> a hook where it is not supported it is a no-op. This can > >>>>>>>>>>> > >>>>>>>>>> lead > >>> > >>>> to a > >>>>> > >>>>>> poor > >>>>>>>> > >>>>>>>>> developer experience. > >>>>>>>>>>> > >>>>>>>>>>> Where bypass is supported what is being bypassed is all of > >>>>>>>>>>> > >>>>>>>>>> the > >>> > >>>> core > >>>>>> > >>>>>>> code > >>>>>>>> > >>>>>>>>> implementing the remainder of the operation. In order to > >>>>>>>>>>> > >>>>>>>>>> understand > >>>>>> > >>>>>>> what > >>>>>>>> > >>>>>>>>> calling bypass() will skip, a coprocessor implementer should > >>>>>>>>>>> > >>>>>>>>>> read > >>>>> > >>>>>> and > >>>>>>> > >>>>>>>> understand all of the remaining code and its nuances. > >>>>>>>>>>> > >>>>>>>>>> Although I > >>>> > >>>>> think > >>>>>>> > >>>>>>>> this > >>>>>>>>>> > >>>>>>>>>>> is good practice for coprocessor developers in general, it > >>>>>>>>>>> > >>>>>>>>>> demands a > >>>>>> > >>>>>>> lot. I > >>>>>>>>>> > >>>>>>>>>>> think it would provide a much better developer experience if > >>>>>>>>>>> > >>>>>>>>>> we > >>>> > >>>>> didn't > >>>>>>> > >>>>>>>> allow bypass, even though it means - in theory - a > >>>>>>>>>>> > >>>>>>>>>> coprocessor > >>> > >>>> would > >>>>>> > >>>>>>> be a > >>>>>>>> > >>>>>>>>> lot more limited in some ways than before. What is skipped > >>>>>>>>>>> > >>>>>>>>>> is > >>> > >>>> extremely > >>>>>>>> > >>>>>>>>> version dependent. That core code will vary, perhaps > >>>>>>>>>>> > >>>>>>>>>> significantly, > >>>>>> > >>>>>>> even > >>>>>>>> > >>>>>>>>> between point releases. We do not provide the promise of > >>>>>>>>>>> > >>>>>>>>>> consistent > >>>>>> > >>>>>>> behavior even between point releases for the bypass > >>>>>>>>>>> > >>>>>>>>>> semantic. > >>> > >>>> To > >>>> > >>>>> achieve > >>>>>>>> > >>>>>>>>> that we could not change any code between hook points. > >>>>>>>>>>> > >>>>>>>>>> Therefore > >>>> > >>>>> the > >>>>>> > >>>>>>> coprocessor implementer becomes an HBase core developer in > >>>>>>>>>>> > >>>>>>>>>> practice > >>>>>> > >>>>>>> as > >>>>>>> > >>>>>>>> soon > >>>>>>>>>> > >>>>>>>>>>> as they rely on bypass(). Every release of HBase may break > >>>>>>>>>>> > >>>>>>>>>> the > >>> > >>>> assumption > >>>>>>>> > >>>>>>>>> that the replacement for the bypassed code takes care of all > >>>>>>>>>>> > >>>>>>>>>> necessary > >>>>>>> > >>>>>>>> skipped concerns. Because those concerns can change at any > >>>>>>>>>>> > >>>>>>>>>> point, > >>>>> > >>>>>> such an > >>>>>>>> > >>>>>>>>> assumption is never safe. > >>>>>>>>>>> > >>>>>>>>>>> I say "in theory" because I would be surprised if anyone is > >>>>>>>>>>> > >>>>>>>>>> relying > >>>>>> > >>>>>>> on > >>>>>>> > >>>>>>>> the > >>>>>>>>>> > >>>>>>>>>>> bypass for the above reason. I seem to recall that Phoenix > >>>>>>>>>>> > >>>>>>>>>> might > >>>> > >>>>> use > >>>>>> > >>>>>>> it > >>>>>>>> > >>>>>>>>> in > >>>>>>>>>> > >>>>>>>>>>> one place to promote a normal mutation into an atomic > >>>>>>>>>>> > >>>>>>>>>> operation, > >>>> > >>>>> by > >>>>>> > >>>>>>> substituting one for the other, but if so that objective > >>>>>>>>>>> > >>>>>>>>>> could > >>> > >>>> be > >>>>> > >>>>>> reimplemented using their new locking manager. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>> > >>> > >> > --001a1147199e1ed07f055bcaeb2c--