hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: [DISCUSSION] Removing the bypass semantic from the Coprocessor APIs
Date Thu, 19 Oct 2017 15:09:19 GMT
Thanks all!

On 10/18/17 12:46 AM, Stack wrote:
> I was going to pick up on the bypass after HBASE-19007 lands, cleaning up
> our exposure of Master/RegionServerServices to Coprocessors (HBASE-19007
> was going bad for a good while but lots of contributors and good discussion
> and now I think we have it). Shouldn't be too much longer.
> 
> Its CP API so I was figuring it an alpha-4 item.
> 
> St.Ack
> 
> On Tue, Oct 17, 2017 at 6:56 PM, 张铎(Duo Zhang) <palomino219@gmail.com>
> wrote:
> 
>> Fine. Let me change the title of HBASE-18770 and prepare a patch there.
>>
>> May still a week or two before alpha4 I think. The scan injection, and
>> flush/compaction trigger/track API is still unstable...
>>
>> 2017-10-18 6:12 GMT+08:00 Josh Elser <elserj@apache.org>:
>>
>>> (catching up here)
>>>
>>> I'm glad to see you fine folks came to a conclusion around a
>> reduced-scope
>>> solution (correct me if I'm wrong). "Some" bypass mechanism would stay
>> for
>>> preXXX methods, and we'd remove it for the other methods? What exactly
>> the
>>> "bypass API" would be is up in the air, correct?
>>>
>>> Duo -- maybe you could put the "current plan" on HBASE-18770 since
>>> discussion appears to have died down?
>>>
>>> I was originally lamenting yet another big, sweeping change to CPs when I
>>> had expected alpha-4 to have already landed. But, let me play devil's
>>> advocate: is this something we still think is critical to do in alpha-4?
>> I
>>> can respect wanting to get address all of these smells, but I'd be worry
>> it
>>> delays us further.
>>>
>>>
>>> On 10/11/17 9:53 PM, 张铎(Duo Zhang) wrote:
>>>
>>>> Creating an exception is expensive so if it is not suggested to do it
>> in a
>>>> normal case. A common trick is to create a global exception instance,
>> and
>>>> always throw it to avoid creating every time but I think it is more
>>>> friendly to just use a return value?
>>>>
>>>> And for me, the bypass after preXXX for normal region operations just
>>>> equals to a 'cancel', which is very clear and easy to understand, so I
>>>> think it is OK to add bypass support for them. And also for compaction
>> and
>>>> flush, it is OK to give CP users the ability to cancel the operation as
>>>> the
>>>> semantic is clear, although I'm not sure how CP users would use this
>>>> feature.
>>>>
>>>> In general, I think we can provide bypass/cancel support in preXXX
>> methods
>>>> where it is the very beginning of an operation.
>>>>
>>>> Thanks.
>>>>
>>>> 2017-10-12 3:10 GMT+08:00 Andrew Purtell <apurtell@apache.org>:
>>>>
>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix wants to
>> use
>>>>>>
>>>>> its long encoding writing Increments. Not sure how we'd do that,
>>>>> selectively.
>>>>>
>>>>> If we can handle the rest of the trouble that you observed:
>>>>>
>>>>> 1) Lack of recognition and identification of when the key value to
>>>>> increment doesn't exist
>>>>> 2) Lack of the ability to set the timestamp of the updated key value.
>>>>>
>>>>> then they might be able to make it work. Perhaps a conversion from
>> HBase
>>>>> native to Phoenix LONG encoding when processing results, in the
>> wrapping
>>>>> scanner, informed by schema metadata.
>>>>>
>>>>> Or if we are keeping the bypass semantic in select places but
>>>>> implementing
>>>>> it with something other than today's bypass() API (please) this would
>> be
>>>>> another candidate for where to keep it. Duo suggests keeping the
>> semantic
>>>>> in all of the basic RPC preXXX hooks for query and mutation. We could
>>>>> redo
>>>>> those APIs to skip normal processing based on a return value or
>> exception
>>>>> but otherwise drop bypass from all the others. It will clean up areas
>> of
>>>>> confusion, e.g. can I bypass splits or flushes or not? Or what about
>> this
>>>>> arcane hook in compaction? Or [insert some deep hook here]? The answer
>>>>> would be: only RPC hooks will early out, and only if you return this
>>>>> value,
>>>>> or throw that exception.
>>>>>
>>>>>
>>>>> On Wed, Oct 11, 2017 at 11:56 AM, Stack <stack@duboce.net> wrote:
>>>>>
>>>>> The YARN Timeline Server has the FlowRunCoprocessor. It does bypass
>> when
>>>>>> user does a Get returning instead the result of its own (Flow) Scan
>>>>>>
>>>>> result.
>>>>>
>>>>>> Not sure how we'd do alternative here; Timeline Server is keeping
Tags
>>>>>> internally.
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 11, 2017 at 10:59 AM, Andrew Purtell <apurtell@apache.org
>>>
>>>>>> wrote:
>>>>>>
>>>>>> Rather than continue to support a weird bypass() which works in some
>>>>>>>
>>>>>> places
>>>>>>
>>>>>>> and not in others, perhaps we can substitute it with an exception?
So
>>>>>>>
>>>>>> if
>>>>>
>>>>>> the coprocessor throws this exception in the pre hook then where
it is
>>>>>>> allowed we catch it and do the right thing, and where it is not
>> allowed
>>>>>>>
>>>>>> we
>>>>>>
>>>>>>> don't catch it and the server aborts. This will at least improve
the
>>>>>>>
>>>>>> silent
>>>>>>
>>>>>>> bypass() failure problem. I also don't like, in retrospect, that
>>>>>>>
>>>>>> calling
>>>>>
>>>>>> this environment method has magic side effects. Everyone understands
>>>>>>>
>>>>>> how
>>>>>
>>>>>> exceptions work, so it will be clearer.
>>>>>>>
>>>>>>>
>>>>>>> We could do that though throw and catch of exceptions would be
>> costly.
>>>>>>
>>>>>> What about the Duo suggestion? Purge bypass flag and replace it w/
>>>>>> preXXX
>>>>>> in a few select methods returning a boolean on whether bypass? Would
>>>>>> that
>>>>>> work? (Would have to figure metrics still).
>>>>>>
>>>>>>
>>>>>>
>>>>>> In any case we should try to address the Tephra and Phoenix cases
>>>>>>>
>>>>>> brought
>>>>>
>>>>>> up in this discussion. They look like we can find alternatives. Shall
>> I
>>>>>>> file JIRAs to follow up?
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Phoenix Increment by-pass, an ornery item is that Phoenix
wants to
>>>>>> use
>>>>>> its long encoding writing Increments. Not sure how we'd do that,
>>>>>> selectively.
>>>>>>
>>>>>> St.Ack
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Oct 11, 2017 at 6:00 AM, 张铎(Duo Zhang) <palomino219@gmail.com
>>>
>>>>>>> wrote:
>>>>>>>
>>>>>>> These examples are great.
>>>>>>>>
>>>>>>>> And I think for normal region operations such as get, put,
delete,
>>>>>>>> checkAndXXX, increment, it is OK to bypass the real operation
after
>>>>>>>>
>>>>>>> preXXX
>>>>>>>
>>>>>>>> as the semantic is clear enough. Instead of calling env.bypass,
>> maybe
>>>>>>>>
>>>>>>> just
>>>>>>>
>>>>>>>> let these preXXX methods return a boolean is enough to tell
the
>> HBase
>>>>>>>> framework that we have already done the real operation so
just give
>>>>>>>>
>>>>>>> up
>>>>>
>>>>>> and
>>>>>>>
>>>>>>>> return?
>>>>>>>>
>>>>>>>> Thanks.
>>>>>>>>
>>>>>>>> 2017-10-11 3:19 GMT+08:00 Gary Helmling <ghelmling@gmail.com>:
>>>>>>>>
>>>>>>>> The Tephra TransactionProcessor CP makes use of bypass()
in
>>>>>>>>>
>>>>>>>> preDelete()
>>>>>>
>>>>>>> to
>>>>>>>>
>>>>>>>>> override handling of delete tombstones in a transactional
way:
>>>>>>>>> https://github.com/apache/incubator-tephra/blob/master/
>>>>>>>>> tephra-hbase-compat-1.3/src/main/java/org/apache/tephra/
>>>>>>>>>
>>>>>>>> hbase/coprocessor/
>>>>>>>>
>>>>>>>>> TransactionProcessor.java#L244
>>>>>>>>>
>>>>>>>>> The CDAP IncrementHandler CP also makes use of bypass()
in
>>>>>>>>>
>>>>>>>> preGetOp()
>>>>>
>>>>>> and
>>>>>>>
>>>>>>>> preIncrementAfterRRowLock() to provide a transaction implementation
>>>>>>>>>
>>>>>>>> of
>>>>>>
>>>>>>> readless increments:
>>>>>>>>> https://github.com/caskdata/cdap/blob/develop/cdap-hbase-
>>>>>>>>> compat-1.1/src/main/java/co/cask/cdap/data2/increment/
>>>>>>>>> hbase11/IncrementHandler.java#L121
>>>>>>>>>
>>>>>>>>> What would be the alternate approach for these applications?
 In
>>>>>>>>>
>>>>>>>> both
>>>>>
>>>>>> cases
>>>>>>>>
>>>>>>>>> they need to impose their own semantics on the underlying
KeyValue
>>>>>>>>> storage.  Is there a different way this can be done?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Oct 10, 2017 at 11:58 AM Anoop John <anoop.hbase@gmail.com
>>>>>>>>>
>>>>>>>>
>>>>>> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>> Wrap core scanners is different right?  That can be done
in post
>>>>>>>>>> hooks.  I have seen many use cases for this..  Its
the question
>>>>>>>>>>
>>>>>>>>> abt
>>>>>
>>>>>> the pre hooks where we have not yet created the core object (like
>>>>>>>>>> scanner).  The CP pre code itself doing the work
of object
>>>>>>>>>>
>>>>>>>>> creation
>>>>>
>>>>>> and so the core code is been bypassed.    Well the wrapping thing
>>>>>>>>>>
>>>>>>>>> can
>>>>>>
>>>>>>> be done in pre hook also. First create the core object by CP
code
>>>>>>>>>> itself and then do the wrapped object and return..
I have seen in
>>>>>>>>>>
>>>>>>>>> one
>>>>>>
>>>>>>> jira issue where the usage was this way..   The wrapping can
be
>>>>>>>>>>
>>>>>>>>> done
>>>>>>
>>>>>>> in post also in such cases I believe.
>>>>>>>>>>
>>>>>>>>>> -Anoop-
>>>>>>>>>>
>>>>>>>>>> On Wed, Oct 11, 2017 at 12:23 AM, Andrew Purtell
<
>>>>>>>>>>
>>>>>>>>> apurtell@apache.org>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> I think we should continue to support overriding
function by
>>>>>>>>>>>
>>>>>>>>>> object
>>>>>>
>>>>>>> inheritance. I didn't mention this and am not proposing more
>>>>>>>>>>>
>>>>>>>>>> than
>>>>>
>>>>>> removing
>>>>>>>>>>
>>>>>>>>>>> the bypass() sematic. No more no less. Phoenix
absolutely
>>>>>>>>>>>
>>>>>>>>>> depends
>>>>>
>>>>>> on
>>>>>>>
>>>>>>>> being
>>>>>>>>>>
>>>>>>>>>>> able to wrap core scanners and return the wrappers.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Oct 10, 2017 at 11:50 AM, Anoop John
<
>>>>>>>>>>>
>>>>>>>>>> anoop.hbase@gmail.com>
>>>>>>>
>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> When we say bypass the core code, it can be done
today not
>>>>>>>>>>>>
>>>>>>>>>>> only
>>>>>
>>>>>> by
>>>>>>
>>>>>>> calling bypass but by returning a not null object for some of
>>>>>>>>>>>>
>>>>>>>>>>> the
>>>>>>
>>>>>>> pre
>>>>>>>>
>>>>>>>>> hooks.  Like preScannerOpen() if it return a scanner
object,
>>>>>>>>>>>>
>>>>>>>>>>> we
>>>>>
>>>>>> will
>>>>>>>
>>>>>>>> avoid the remaining core code execution for creation of the
>>>>>>>>>>>> scanner(s).  So this proposal include this
aspect also and
>>>>>>>>>>>>
>>>>>>>>>>> remove
>>>>>>
>>>>>>> any
>>>>>>>>
>>>>>>>>> possible way of bypassing the core code by the CP hook
code
>>>>>>>>>>>>
>>>>>>>>>>> execution
>>>>>>>>
>>>>>>>>> ?   Am +1.
>>>>>>>>>>>>
>>>>>>>>>>>> -Anoop-
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Oct 10, 2017 at 11:40 PM, Andrew
Purtell <
>>>>>>>>>>>>
>>>>>>>>>>> apurtell@apache.org
>>>>>>>>
>>>>>>>>>
>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> The coprocessor API provides an environment
method,
>>>>>>>>>>>>>
>>>>>>>>>>>> bypass(),
>>>>>
>>>>>> that
>>>>>>>
>>>>>>>> when
>>>>>>>>>>
>>>>>>>>>>> called from a preXXX hook will cause the core
code to skip
>>>>>>>>>>>>>
>>>>>>>>>>>> all
>>>>>
>>>>>> remaining
>>>>>>>>>>
>>>>>>>>>>> processing. This capability was introduced on
HBASE-3348.
>>>>>>>>>>>>>
>>>>>>>>>>>> Since
>>>>>>
>>>>>>> this
>>>>>>>>
>>>>>>>>> time I
>>>>>>>>>>>>
>>>>>>>>>>>>> think we are more enlightened about the
complications of
>>>>>>>>>>>>>
>>>>>>>>>>>> this
>>>>>
>>>>>> feature.
>>>>>>>>>
>>>>>>>>>> (Or,
>>>>>>>>>>>>
>>>>>>>>>>>>> anyway, speaking for myself:)
>>>>>>>>>>>>>
>>>>>>>>>>>>> Not all hooks provide the bypass semantic.
Where this is the
>>>>>>>>>>>>>
>>>>>>>>>>>> case
>>>>>>>
>>>>>>>> the
>>>>>>>>>
>>>>>>>>>> javadoc for the hook says so, but it can be missed.
If you
>>>>>>>>>>>>>
>>>>>>>>>>>> call
>>>>>>
>>>>>>> bypass()
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>>
>>>>>>>>>>>>> a hook where it is not supported it is
a no-op. This can
>>>>>>>>>>>>>
>>>>>>>>>>>> lead
>>>>>
>>>>>> to a
>>>>>>>
>>>>>>>> poor
>>>>>>>>>>
>>>>>>>>>>> developer experience.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Where bypass is supported what is being
bypassed is all of
>>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>
>>>>>> core
>>>>>>>>
>>>>>>>>> code
>>>>>>>>>>
>>>>>>>>>>> implementing the remainder of the operation.
In order to
>>>>>>>>>>>>>
>>>>>>>>>>>> understand
>>>>>>>>
>>>>>>>>> what
>>>>>>>>>>
>>>>>>>>>>> calling bypass() will skip, a coprocessor implementer
should
>>>>>>>>>>>>>
>>>>>>>>>>>> read
>>>>>>>
>>>>>>>> and
>>>>>>>>>
>>>>>>>>>> understand all of the remaining code and its nuances.
>>>>>>>>>>>>>
>>>>>>>>>>>> Although I
>>>>>>
>>>>>>> think
>>>>>>>>>
>>>>>>>>>> this
>>>>>>>>>>>>
>>>>>>>>>>>>> is good practice for coprocessor developers
in general, it
>>>>>>>>>>>>>
>>>>>>>>>>>> demands a
>>>>>>>>
>>>>>>>>> lot. I
>>>>>>>>>>>>
>>>>>>>>>>>>> think it would provide a much better
developer experience if
>>>>>>>>>>>>>
>>>>>>>>>>>> we
>>>>>>
>>>>>>> didn't
>>>>>>>>>
>>>>>>>>>> allow bypass, even though it means - in theory -
a
>>>>>>>>>>>>>
>>>>>>>>>>>> coprocessor
>>>>>
>>>>>> would
>>>>>>>>
>>>>>>>>> be a
>>>>>>>>>>
>>>>>>>>>>> lot more limited in some ways than before. What
is skipped
>>>>>>>>>>>>>
>>>>>>>>>>>> is
>>>>>
>>>>>> extremely
>>>>>>>>>>
>>>>>>>>>>> version dependent. That core code will vary,
perhaps
>>>>>>>>>>>>>
>>>>>>>>>>>> significantly,
>>>>>>>>
>>>>>>>>> even
>>>>>>>>>>
>>>>>>>>>>> between point releases. We do not provide the
promise of
>>>>>>>>>>>>>
>>>>>>>>>>>> consistent
>>>>>>>>
>>>>>>>>> behavior even between point releases for the bypass
>>>>>>>>>>>>>
>>>>>>>>>>>> semantic.
>>>>>
>>>>>> To
>>>>>>
>>>>>>> achieve
>>>>>>>>>>
>>>>>>>>>>> that we could not change any code between hook
points.
>>>>>>>>>>>>>
>>>>>>>>>>>> Therefore
>>>>>>
>>>>>>> the
>>>>>>>>
>>>>>>>>> coprocessor implementer becomes an HBase core developer
in
>>>>>>>>>>>>>
>>>>>>>>>>>> practice
>>>>>>>>
>>>>>>>>> as
>>>>>>>>>
>>>>>>>>>> soon
>>>>>>>>>>>>
>>>>>>>>>>>>> as they rely on bypass(). Every release
of HBase may break
>>>>>>>>>>>>>
>>>>>>>>>>>> the
>>>>>
>>>>>> assumption
>>>>>>>>>>
>>>>>>>>>>> that the replacement for the bypassed code takes
care of all
>>>>>>>>>>>>>
>>>>>>>>>>>> necessary
>>>>>>>>>
>>>>>>>>>> skipped concerns. Because those concerns can change
at any
>>>>>>>>>>>>>
>>>>>>>>>>>> point,
>>>>>>>
>>>>>>>> such an
>>>>>>>>>>
>>>>>>>>>>> assumption is never safe.
>>>>>>>>>>>>>
>>>>>>>>>>>>> I say "in theory" because I would be
surprised if anyone is
>>>>>>>>>>>>>
>>>>>>>>>>>> relying
>>>>>>>>
>>>>>>>>> on
>>>>>>>>>
>>>>>>>>>> the
>>>>>>>>>>>>
>>>>>>>>>>>>> bypass for the above reason. I seem to
recall that Phoenix
>>>>>>>>>>>>>
>>>>>>>>>>>> might
>>>>>>
>>>>>>> use
>>>>>>>>
>>>>>>>>> it
>>>>>>>>>>
>>>>>>>>>>> in
>>>>>>>>>>>>
>>>>>>>>>>>>> one place to promote a normal mutation
into an atomic
>>>>>>>>>>>>>
>>>>>>>>>>>> operation,
>>>>>>
>>>>>>> by
>>>>>>>>
>>>>>>>>> substituting one for the other, but if so that objective
>>>>>>>>>>>>>
>>>>>>>>>>>> could
>>>>>
>>>>>> be
>>>>>>>
>>>>>>>> reimplemented using their new locking manager.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>
>>>>>
>>>>
>>
> 

Mime
View raw message