hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stack <st...@duboce.net>
Subject Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs
Date Wed, 18 Oct 2017 04:46:08 GMT
I was going to pick up on the bypass after HBASE-19007 lands, cleaning up
our exposure of Master/RegionServerServices to Coprocessors (HBASE-19007
was going bad for a good while but lots of contributors and good discussion
and now I think we have it). Shouldn't be too much longer.

Its CP API so I was figuring it an alpha-4 item.

St.Ack

On Tue, Oct 17, 2017 at 6:56 PM, 张铎(Duo Zhang) <palomino219@gmail.com>
wrote:

> Fine. Let me change the title of HBASE-18770 and prepare a patch there.
>
> May still a week or two before alpha4 I think. The scan injection, and
> flush/compaction trigger/track API is still unstable...
>
> 2017-10-18 6:12 GMT+08:00 Josh Elser <elserj@apache.org>:
>
> > (catching up here)
> >
> > I'm glad to see you fine folks came to a conclusion around a
> reduced-scope
> > solution (correct me if I'm wrong). "Some" bypass mechanism would stay
> for
> > preXXX methods, and we'd remove it for the other methods? What exactly
> the
> > "bypass API" would be is up in the air, correct?
> >
> > Duo -- maybe you could put the "current plan" on HBASE-18770 since
> > discussion appears to have died down?
> >
> > I was originally lamenting yet another big, sweeping change to CPs when I
> > had expected alpha-4 to have already landed. But, let me play devil's
> > advocate: is this something we still think is critical to do in alpha-4?
> I
> > can respect wanting to get address all of these smells, but I'd be worry
> it
> > delays us further.
> >
> >
> > On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
> >
> >> Creating an exception is expensive so if it is not suggested to do it
> in a
> >> normal case. A common trick is to create a global exception instance,
> and
> >> always throw it to avoid creating every time but I think it is more
> >> friendly to just use a return value?
> >>
> >> And for me, the bypass after preXXX for normal region operations just
> >> equals to a 'cancel', which is very clear and easy to understand, so I
> >> think it is OK to add bypass support for them. And also for compaction
> and
> >> flush, it is OK to give CP users the ability to cancel the operation as
> >> the
> >> semantic is clear, although I'm not sure how CP users would use this
> >> feature.
> >>
> >> In general, I think we can provide bypass/cancel support in preXXX
> methods
> >> where it is the very beginning of an operation.
> >>
> >> Thanks.
> >>
> >> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <apurtell@apache.org>:
> >>
> >> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
> use
> >>>>
> >>> its long encoding writing Increments. Not sure how we'd do that,
> >>> selectively.
> >>>
> >>> If we can handle the rest of the trouble that you observed:
> >>>
> >>> 1) Lack of recognition and identification of when the key value to
> >>> increment doesn't exist
> >>> 2) Lack of the ability to set the timestamp of the updated key value.
> >>>
> >>> then they might be able to make it work. Perhaps a conversion from
> HBase
> >>> native to Phoenix LONG encoding when processing results, in the
> wrapping
> >>> scanner, informed by schema metadata.
> >>>
> >>> Or if we are keeping the bypass semantic in select places but
> >>> implementing
> >>> it with something other than today's bypass() API (please) this would
> be
> >>> another candidate for where to keep it. Duo suggests keeping the
> semantic
> >>> in all of the basic RPC preXXX hooks for query and mutation. We could
> >>> redo
> >>> those APIs to skip normal processing based on a return value or
> exception
> >>> but otherwise drop bypass from all the others. It will clean up areas
> of
> >>> confusion, e.g. can I bypass splits or flushes or not? Or what about
> this
> >>> arcane hook in compaction? Or [insert some deep hook here]? The answer
> >>> would be: only RPC hooks will early out, and only if you return this
> >>> value,
> >>> or throw that exception.
> >>>
> >>>
> >>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <stack@duboce.net> wrote:
> >>>
> >>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass
> when
> >>>> user does a Get returning instead the result of its own (Flow) Scan
> >>>>
> >>> result.
> >>>
> >>>> Not sure how we'd do alternative here; Timeline Server is keeping Tags
> >>>> internally.
> >>>>
> >>>>
> >>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <apurtell@apache.org
> >
> >>>> wrote:
> >>>>
> >>>> Rather than continue to support a weird bypass() which works in some
> >>>>>
> >>>> places
> >>>>
> >>>>> and not in others, perhaps we can substitute it with an exception?
So
> >>>>>
> >>>> if
> >>>
> >>>> the coprocessor throws this exception in the pre hook then where it
is
> >>>>> allowed we catch it and do the right thing, and where it is not
> allowed
> >>>>>
> >>>> we
> >>>>
> >>>>> don't catch it and the server aborts. This will at least improve
the
> >>>>>
> >>>> silent
> >>>>
> >>>>> bypass() failure problem. I also don't like, in retrospect, that
> >>>>>
> >>>> calling
> >>>
> >>>> this environment method has magic side effects. Everyone understands
> >>>>>
> >>>> how
> >>>
> >>>> exceptions work, so it will be clearer.
> >>>>>
> >>>>>
> >>>>> We could do that though throw and catch of exceptions would be
> costly.
> >>>>
> >>>> What about the Duo suggestion? Purge bypass flag and replace it w/
> >>>> preXXX
> >>>> in a few select methods returning a boolean on whether bypass? Would
> >>>> that
> >>>> work? (Would have to figure metrics still).
> >>>>
> >>>>
> >>>>
> >>>> In any case we should try to address the Tephra and Phoenix cases
> >>>>>
> >>>> brought
> >>>
> >>>> up in this discussion. They look like we can find alternatives. Shall
> I
> >>>>> file JIRAs to follow up?
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants
to
> >>>> use
> >>>> its long encoding writing Increments. Not sure how we'd do that,
> >>>> selectively.
> >>>>
> >>>> St.Ack
> >>>>
> >>>>
> >>>>
> >>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <palomino219@gmail.com
> >
> >>>>> wrote:
> >>>>>
> >>>>> These examples are great.
> >>>>>>
> >>>>>> And I think for normal region operations such as get, put, delete,
> >>>>>> checkAndXXX, increment, it is OK to bypass the real operation
after
> >>>>>>
> >>>>> preXXX
> >>>>>
> >>>>>> as the semantic is clear enough. Instead of calling env.bypass,
> maybe
> >>>>>>
> >>>>> just
> >>>>>
> >>>>>> let these preXXX methods return a boolean is enough to tell
the
> HBase
> >>>>>> framework that we have already done the real operation so just
give
> >>>>>>
> >>>>> up
> >>>
> >>>> and
> >>>>>
> >>>>>> return?
> >>>>>>
> >>>>>> Thanks.
> >>>>>>
> >>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <ghelmling@gmail.com>:
> >>>>>>
> >>>>>> The Tephra TransactionProcessor CP makes use of bypass() in
> >>>>>>>
> >>>>>> preDelete()
> >>>>
> >>>>> to
> >>>>>>
> >>>>>>> override handling of delete tombstones in a transactional
way:
> >>>>>>> https://github.com/apache/incubator-tephra/blob/master/
> >>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
> >>>>>>>
> >>>>>> hbase/coprocessor/
> >>>>>>
> >>>>>>> TransactionProcessor.java#L244
> >>>>>>>
> >>>>>>> The CDAP IncrementHandler CP also makes use of bypass()
in
> >>>>>>>
> >>>>>> preGetOp()
> >>>
> >>>> and
> >>>>>
> >>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
> >>>>>>>
> >>>>>> of
> >>>>
> >>>>> readless increments:
> >>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
> >>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
> >>>>>>> hbase11/IncrementHandler.java#L121
> >>>>>>>
> >>>>>>> What would be the alternate approach for these applications?
 In
> >>>>>>>
> >>>>>> both
> >>>
> >>>> cases
> >>>>>>
> >>>>>>> they need to impose their own semantics on the underlying
KeyValue
> >>>>>>> storage.  Is there a different way this can be done?
> >>>>>>>
> >>>>>>>
> >>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <anoop.hbase@gmail.com
> >>>>>>>
> >>>>>>
> >>>> wrote:
> >>>>>>
> >>>>>>>
> >>>>>>> Wrap core scanners is different right?  That can be done
in post
> >>>>>>>> hooks.  I have seen many use cases for this..  Its the
question
> >>>>>>>>
> >>>>>>> abt
> >>>
> >>>> the pre hooks where we have not yet created the core object (like
> >>>>>>>> scanner).  The CP pre code itself doing the work of
object
> >>>>>>>>
> >>>>>>> creation
> >>>
> >>>> and so the core code is been bypassed.    Well the wrapping thing
> >>>>>>>>
> >>>>>>> can
> >>>>
> >>>>> be done in pre hook also. First create the core object by CP code
> >>>>>>>> itself and then do the wrapped object and return.. I
have seen in
> >>>>>>>>
> >>>>>>> one
> >>>>
> >>>>> jira issue where the usage was this way..   The wrapping can be
> >>>>>>>>
> >>>>>>> done
> >>>>
> >>>>> in post also in such cases I believe.
> >>>>>>>>
> >>>>>>>> -Anoop-
> >>>>>>>>
> >>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell <
> >>>>>>>>
> >>>>>>> apurtell@apache.org>
> >>>>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> I think we should continue to support overriding
function by
> >>>>>>>>>
> >>>>>>>> object
> >>>>
> >>>>> inheritance. I didn't mention this and am not proposing more
> >>>>>>>>>
> >>>>>>>> than
> >>>
> >>>> removing
> >>>>>>>>
> >>>>>>>>> the bypass() sematic. No more no less. Phoenix absolutely
> >>>>>>>>>
> >>>>>>>> depends
> >>>
> >>>> on
> >>>>>
> >>>>>> being
> >>>>>>>>
> >>>>>>>>> able to wrap core scanners and return the wrappers.
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John <
> >>>>>>>>>
> >>>>>>>> anoop.hbase@gmail.com>
> >>>>>
> >>>>>> wrote:
> >>>>>>>>
> >>>>>>>>>
> >>>>>>>>> When we say bypass the core code, it can be done
today not
> >>>>>>>>>>
> >>>>>>>>> only
> >>>
> >>>> by
> >>>>
> >>>>> calling bypass but by returning a not null object for some of
> >>>>>>>>>>
> >>>>>>>>> the
> >>>>
> >>>>> pre
> >>>>>>
> >>>>>>> hooks.  Like preScannerOpen() if it return a scanner object,
> >>>>>>>>>>
> >>>>>>>>> we
> >>>
> >>>> will
> >>>>>
> >>>>>> avoid the remaining core code execution for creation of the
> >>>>>>>>>> scanner(s).  So this proposal include this aspect
also and
> >>>>>>>>>>
> >>>>>>>>> remove
> >>>>
> >>>>> any
> >>>>>>
> >>>>>>> possible way of bypassing the core code by the CP hook code
> >>>>>>>>>>
> >>>>>>>>> execution
> >>>>>>
> >>>>>>> ?   Am +1.
> >>>>>>>>>>
> >>>>>>>>>> -Anoop-
> >>>>>>>>>>
> >>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew Purtell
<
> >>>>>>>>>>
> >>>>>>>>> apurtell@apache.org
> >>>>>>
> >>>>>>>
> >>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> The coprocessor API provides an environment
method,
> >>>>>>>>>>>
> >>>>>>>>>> bypass(),
> >>>
> >>>> that
> >>>>>
> >>>>>> when
> >>>>>>>>
> >>>>>>>>> called from a preXXX hook will cause the core code
to skip
> >>>>>>>>>>>
> >>>>>>>>>> all
> >>>
> >>>> remaining
> >>>>>>>>
> >>>>>>>>> processing. This capability was introduced on HBASE-3348.
> >>>>>>>>>>>
> >>>>>>>>>> Since
> >>>>
> >>>>> this
> >>>>>>
> >>>>>>> time I
> >>>>>>>>>>
> >>>>>>>>>>> think we are more enlightened about the
complications of
> >>>>>>>>>>>
> >>>>>>>>>> this
> >>>
> >>>> feature.
> >>>>>>>
> >>>>>>>> (Or,
> >>>>>>>>>>
> >>>>>>>>>>> anyway, speaking for myself:)
> >>>>>>>>>>>
> >>>>>>>>>>> Not all hooks provide the bypass semantic.
Where this is the
> >>>>>>>>>>>
> >>>>>>>>>> case
> >>>>>
> >>>>>> the
> >>>>>>>
> >>>>>>>> javadoc for the hook says so, but it can be missed.
If you
> >>>>>>>>>>>
> >>>>>>>>>> call
> >>>>
> >>>>> bypass()
> >>>>>>>>
> >>>>>>>>> in
> >>>>>>>>>>
> >>>>>>>>>>> a hook where it is not supported it is a
no-op. This can
> >>>>>>>>>>>
> >>>>>>>>>> lead
> >>>
> >>>> to a
> >>>>>
> >>>>>> poor
> >>>>>>>>
> >>>>>>>>> developer experience.
> >>>>>>>>>>>
> >>>>>>>>>>> Where bypass is supported what is being
bypassed is all of
> >>>>>>>>>>>
> >>>>>>>>>> the
> >>>
> >>>> core
> >>>>>>
> >>>>>>> code
> >>>>>>>>
> >>>>>>>>> implementing the remainder of the operation. In
order to
> >>>>>>>>>>>
> >>>>>>>>>> understand
> >>>>>>
> >>>>>>> what
> >>>>>>>>
> >>>>>>>>> calling bypass() will skip, a coprocessor implementer
should
> >>>>>>>>>>>
> >>>>>>>>>> read
> >>>>>
> >>>>>> and
> >>>>>>>
> >>>>>>>> understand all of the remaining code and its nuances.
> >>>>>>>>>>>
> >>>>>>>>>> Although I
> >>>>
> >>>>> think
> >>>>>>>
> >>>>>>>> this
> >>>>>>>>>>
> >>>>>>>>>>> is good practice for coprocessor developers
in general, it
> >>>>>>>>>>>
> >>>>>>>>>> demands a
> >>>>>>
> >>>>>>> lot. I
> >>>>>>>>>>
> >>>>>>>>>>> think it would provide a much better developer
experience if
> >>>>>>>>>>>
> >>>>>>>>>> we
> >>>>
> >>>>> didn't
> >>>>>>>
> >>>>>>>> allow bypass, even though it means - in theory - a
> >>>>>>>>>>>
> >>>>>>>>>> coprocessor
> >>>
> >>>> would
> >>>>>>
> >>>>>>> be a
> >>>>>>>>
> >>>>>>>>> lot more limited in some ways than before. What
is skipped
> >>>>>>>>>>>
> >>>>>>>>>> is
> >>>
> >>>> extremely
> >>>>>>>>
> >>>>>>>>> version dependent. That core code will vary, perhaps
> >>>>>>>>>>>
> >>>>>>>>>> significantly,
> >>>>>>
> >>>>>>> even
> >>>>>>>>
> >>>>>>>>> between point releases. We do not provide the promise
of
> >>>>>>>>>>>
> >>>>>>>>>> consistent
> >>>>>>
> >>>>>>> behavior even between point releases for the bypass
> >>>>>>>>>>>
> >>>>>>>>>> semantic.
> >>>
> >>>> To
> >>>>
> >>>>> achieve
> >>>>>>>>
> >>>>>>>>> that we could not change any code between hook points.
> >>>>>>>>>>>
> >>>>>>>>>> Therefore
> >>>>
> >>>>> the
> >>>>>>
> >>>>>>> coprocessor implementer becomes an HBase core developer
in
> >>>>>>>>>>>
> >>>>>>>>>> practice
> >>>>>>
> >>>>>>> as
> >>>>>>>
> >>>>>>>> soon
> >>>>>>>>>>
> >>>>>>>>>>> as they rely on bypass(). Every release
of HBase may break
> >>>>>>>>>>>
> >>>>>>>>>> the
> >>>
> >>>> assumption
> >>>>>>>>
> >>>>>>>>> that the replacement for the bypassed code takes
care of all
> >>>>>>>>>>>
> >>>>>>>>>> necessary
> >>>>>>>
> >>>>>>>> skipped concerns. Because those concerns can change
at any
> >>>>>>>>>>>
> >>>>>>>>>> point,
> >>>>>
> >>>>>> such an
> >>>>>>>>
> >>>>>>>>> assumption is never safe.
> >>>>>>>>>>>
> >>>>>>>>>>> I say "in theory" because I would be surprised
if anyone is
> >>>>>>>>>>>
> >>>>>>>>>> relying
> >>>>>>
> >>>>>>> on
> >>>>>>>
> >>>>>>>> the
> >>>>>>>>>>
> >>>>>>>>>>> bypass for the above reason. I seem to recall
that Phoenix
> >>>>>>>>>>>
> >>>>>>>>>> might
> >>>>
> >>>>> use
> >>>>>>
> >>>>>>> it
> >>>>>>>>
> >>>>>>>>> in
> >>>>>>>>>>
> >>>>>>>>>>> one place to promote a normal mutation into
an atomic
> >>>>>>>>>>>
> >>>>>>>>>> operation,
> >>>>
> >>>>> by
> >>>>>>
> >>>>>>> substituting one for the other, but if so that objective
> >>>>>>>>>>>
> >>>>>>>>>> could
> >>>
> >>>> be
> >>>>>
> >>>>>> reimplemented using their new locking manager.
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>
> >>>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message