hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-4047) [Coprocessors] Generic external process host
Date Sun, 18 Nov 2012 01:30:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13499577#comment-13499577

Andrew Purtell commented on HBASE-4047:

[~asafm] I didn't get beyond some early high level thoughts. Therefore there is no data, but
sure there will be some performance penalty, we must introduce an RPC mechanism between the
RegionServer and the child external coprocessor host.

It seems reasonable that the external coprocessor host should handle all IPC issues, use Process/ProcessBuilder
to launch a child process for hosting the user coprocessor code and get access to its stdin
and stdout.

We will need to introduce a new type of Observer to the coprocessor framework that can be
a singleton watching all regions in the RS. Currently we allocate a coprocessor environment
for each region and an Observer can only see what goes on in that environment (for only that
region). Otherwise you can imagine for a RS hosting 1000 regions there might be 1000 threads
just for IPC between the external coprocessor host in the RS and not one child but 1000. That's
a nonstarter. So we want one coprocessor in the RS managing communication to one child, and
both parent+child handle all Observer (and Endpoint) actions on all regions, using NIO to
multiplex communication among the input and output streams set up by Process/ProcessBuilder.
How efficiently this can be done and how low latency it can be kept will determine the performance
penalty for external coprocessors.
> [Coprocessors] Generic external process host
> --------------------------------------------
>                 Key: HBASE-4047
>                 URL: https://issues.apache.org/jira/browse/HBASE-4047
>             Project: HBase
>          Issue Type: New Feature
>          Components: Coprocessors
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
> Where HBase coprocessors deviate substantially from the design (as I understand it) of
Google's BigTable coprocessors is we've reimagined it as a framework for internal extension.
In contrast BigTable coprocessors run as separate processes colocated with tablet servers.
The essential trade off is between performance, flexibility and possibility, and the ability
to control and enforce resource usage.
> Since the initial design of HBase coprocessors some additional considerations are in
> - Developing computational frameworks sitting directly on top of HBase hosted in coprocessor(s);
> - Introduction of the map reduce next generation (mrng) resource management model, and
the probability that limits will be enforced via cgroups at the OS level after this is generally
available, e.g. when RHEL 6 deployments are common;
> - The possibility of deployment of HBase onto mrng-enabled Hadoop clusters via the mrng
resource manager and a HBase-specific application controller.
> Therefore we should consider developing a coprocessor that is a generic host for another
coprocessor, but one that forks a child process, loads the target coprocessor into the child,
establishes a bidirectional pipe and uses an eventing model and umbilical protocol to provide
for the coprocessor loaded into the child the same semantics as if it was loaded internally
to the parent, and (eventually) use available resource management capabilities on the platform
-- perhaps via the mrng resource controller or directly with cgroups -- to limit the child
as desired by system administrators or the application designer.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message