hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-11125) Introduce a higher level interface for registering interest in coprocessor upcalls
Date Sat, 27 Dec 2014 19:50:14 GMT

    [ https://issues.apache.org/jira/browse/HBASE-11125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14259470#comment-14259470
] 

Andrew Purtell edited comment on HBASE-11125 at 12/27/14 7:49 PM:
------------------------------------------------------------------

The core principle of the current coprocessor API is minimization of overhead. We have a “kernel
hook” API where execution of extension code takes place in the current thread to avoid a
context switch and copying, using low level types to avoid translation costs, allocations
and copying. This is why the current API has been successful, and we want to retain it, but
as a result of this choice:
# Misbehaving code can take down the server.
# Many low level types that do not and cannot have compatibility guarantees are exposed to
coprocessor applications.
# Interfaces like RegionObserver carry a lot of internal details that might be unrelated to
the task(s) at hand.

This issue focuses on the latter two problems. (The first can be addressed by HBASE-4047.)

A proposal.

Create a new API based around an interface called Extension. Extension can knit together coprocessors
and plugins.

Extensions would have a method called at load time that returns a list of objects for which
their types express intentions. Intention types would be fine-grained, expressing:
- A request to listen for an event (read only), a _xxx_Listener, either globally or on a per-table
basis
- A request to intercept an event (read with possible modification or drop), a _xxx_Transformer,
either globally or on a per-table basis
- A request to implement an Endpoint interface (or part of one?)

As a rule of thumb we would define one intention type for each:
- Invocation of a method of an Observer: _xxx_Transformer for pre hooks, _xxx_Listener for
post hooks, e.g. DeleteTransformer -> preDelete, DeleteListener -> postDelete
- Invocation of a method of a plugin: flush policy, compaction policy, split policy, etc.

- Endpoint

A naive implementation would maintain lists of intentions at various hook points. For each
operation perhaps several lists would need to be walked and processed in turn. I think we
can do better and maintain the performance of the current API.

An Extension ClassLoader could generate code for wiring up intentions to low level hooks or
plugin sites. For example if we have several intentions that map to RegionObserver methods,
we would codegen a BaseRegionObserver subclass, folding in bytecode of the intentions, and
install it. Or if we find intention to override split policy, we would codegen a delegating
split policy implementation, folding in the bytecode of the intention, delegating everything
else to whatever plugin is already installed, then install the result.

It will not be necessary to have complete coverage of all coprocessor hooks in the collection
of intent types for the higher level API to be useful. We should start with straightforward
cases and then extend it over time. Consider RegionObserver#preBatchMutate. We don't want
to expose MiniBatchOperationInProgress. Too tied into low level details of how the regionserver
processes batch RPCs. Instead, we'd collect intentions scoped narrowly to mutation types (Append,
Increment, Put) and synthesize a hook for preBatchMutate as needed. Or, consider RegionObserver#preCheckAndDelete.
We might want to combine Get and Delete intentions into a synthetic hook for preCheckAndDelete,
but not have an explicit CheckAndDelete intention, which exposes a RPC detail. Design for
different cases can be done in subtasks.

Code generation allows us to decouple intention types from internals. For example, a PutTransformer
would be installed as a RegionObserver with an implemented prePut method. This is what prePut
hooks look like today:

{code}
void prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit,
Durability durability)
{code}

Ideally the PutTransformer intention type should only know about the Put type and have a reference
to a context if it needs to be stateful. We can carefully add state to the intention type
for controlling durability. We should have a separate intention for modifying WALEdits. We
can do this without leaking out the WALEdit type. Yet the "transformer" code would run in
a prePut hook and get good performance. We could even change the signature of RegionObserver#prePut
at any time, provided the code generator that maps intentions to low level implementation
is updated likewise (setting aside other considerations for the moment).

We would aim for code generation that can be maintained by committers not experts in JVM internals.
That said, some complexity is unavoidable. I think the promise of composability of fine grained
intentions, API-level supportability of hiding internal types, and the implied performance
of “inlining” intentions into straight line code for low level hooks could be well worth
it. We can mitigate maintenance risks by placing the Extension API and code generator into
its own Maven module. This module would provide a system level coprocessor that must be installed
via site configuration for experimental “Extension” API support. It would be optional
and decoupled from the client and server core modules. 

Because we are keeping the low level "kernel-hook"-style API the lack of access to internal
types and lack of functional coverage in a higher level API wouldn't be a problem. An implementor
could always resort to direct use of low level interfaces. Of course we would want to figure
out how to implement the desired extension in higher level terms.


was (Author: apurtell):
The core principle of the current coprocessor API is minimization of overhead. We have a “kernel
hook” API where execution of extension code takes place in the current thread to avoid a
context switch and copying, using low level types to avoid translation costs, allocations
and copying. This is why the current API has been successful, and we want to retain it, but
as a result of this choice:
# Misbehaving code can take down the server.
# Many low level types that do not and cannot have compatibility guarantees are exposed to
coprocessor applications.
# Interfaces like RegionObserver carry a lot of internal details that might be unrelated to
the task(s) at hand.

This issue focuses on the latter two problems. (The first can be addressed by HBASE-4147.)

A proposal.

Create a new API based around an interface called Extension. Extension can knit together coprocessors
and plugins.

Extensions would have a method called at load time that returns a list of objects for which
their types express intentions. Intention types would be fine-grained, expressing:
- A request to listen for an event (read only), a _xxx_Listener, either globally or on a per-table
basis
- A request to intercept an event (read with possible modification or drop), a _xxx_Transformer,
either globally or on a per-table basis
- A request to implement an Endpoint interface (or part of one?)

As a rule of thumb we would define one intention type for each:
- Invocation of a method of an Observer: _xxx_Transformer for pre hooks, _xxx_Listener for
post hooks, e.g. DeleteTransformer -> preDelete, DeleteListener -> postDelete
- Invocation of a method of a plugin: flush policy, compaction policy, split policy, etc.

- Endpoint

A naive implementation would maintain lists of intentions at various hook points. For each
operation perhaps several lists would need to be walked and processed in turn. I think we
can do better and maintain the performance of the current API.

An Extension ClassLoader could generate code for wiring up intentions to low level hooks or
plugin sites. For example if we have several intentions that map to RegionObserver methods,
we would codegen a BaseRegionObserver subclass, folding in bytecode of the intentions, and
install it. Or if we find intention to override split policy, we would codegen a delegating
split policy implementation, folding in the bytecode of the intention, delegating everything
else to whatever plugin is already installed, then install the result.

It will not be necessary to have complete coverage of all coprocessor hooks in the collection
of intent types for the higher level API to be useful. We should start with straightforward
cases and then extend it over time. Consider RegionObserver#preBatchMutate. We don't want
to expose MiniBatchOperationInProgress. Too tied into low level details of how the regionserver
processes batch RPCs. Instead, we'd collect intentions scoped narrowly to mutation types (Append,
Increment, Put) and synthesize a hook for preBatchMutate as needed. Or, consider RegionObserver#preCheckAndDelete.
We might want to combine Get and Delete intentions into a synthetic hook for preCheckAndDelete,
but not have an explicit CheckAndDelete intention, which exposes a RPC detail. Design for
different cases can be done in subtasks.

Code generation allows us to decouple intention types from internals. For example, a PutTransformer
would be installed as a RegionObserver with an implemented prePut method. This is what prePut
hooks look like today:

{code}
void prePut(ObserverContext<RegionCoprocessorEnvironment> c, Put put, WALEdit edit,
Durability durability)
{code}

Ideally the PutTransformer intention type should only know about the Put type and have a reference
to a context if it needs to be stateful. We can carefully add state to the intention type
for controlling durability. We should have a separate intention for modifying WALEdits. We
can do this without leaking out the WALEdit type. Yet the "transformer" code would run in
a prePut hook and get good performance. We could even change the signature of RegionObserver#prePut
at any time, provided the code generator that maps intentions to low level implementation
is updated likewise (setting aside other considerations for the moment).

We would aim for code generation that can be maintained by committers not experts in JVM internals.
That said, some complexity is unavoidable. I think the promise of composability of fine grained
intentions, API-level supportability of hiding internal types, and the implied performance
of “inlining” intentions into straight line code for low level hooks could be well worth
it. We can mitigate maintenance risks by placing the Extension API and code generator into
its own Maven module. This module would provide a system level coprocessor that must be installed
via site configuration for experimental “Extension” API support. It would be optional
and decoupled from the client and server core modules. 

Because we are keeping the low level "kernel-hook"-style API the lack of access to internal
types and lack of functional coverage in a higher level API wouldn't be a problem. An implementor
could always resort to direct use of low level interfaces. Of course we would want to figure
out how to implement the desired extension in higher level terms.

> Introduce a higher level interface for registering interest in coprocessor upcalls
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-11125
>                 URL: https://issues.apache.org/jira/browse/HBASE-11125
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Andrew Purtell
>            Priority: Critical
>
> We should introduce a higher level interface for managing the registration of 'user'
code for execution from the low level hooks. It should not be necessary for coprocessor implementers
to learn the universe of available low level hooks and the subtleties of their placement within
HBase core code. Instead the higher level API should allow the implementer to describe their
intent and then this API should choose the appropriate low level hook placement.
> A very desirable side effect is a layer of indirection between coprocessor implementers
and the actual hooks. This will address the perennial complaint that the low level hooks change
too much from release to release, as recently discussed during the RM panel at HBaseCon. If
we try to avoid changing the particular placement and arguments of hook functions in response
to those complaints, this can be an onerous constraint on necessary internals evolution. Instead
we can direct coprocessor implementers to consider the new API and provide the same interface
stability guarantees there as we do for client API, 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message