hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2001) Coprocessors: Colocate arbitrary code with regions
Date Tue, 01 Dec 2009 02:31:25 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784016#action_12784016

Andrew Purtell commented on HBASE-2001:

"Regions contain references to the coprocessor implementation classes associated with them."
Q: On above, its indeed the classes, not objects?  Objects can cross the split?  Not easily

When regions are split, new coprocessor object instances would be allocated on the daughters
-- one instance for each of the coprocessor classes listed in the region metadata -- when
they are opening and the coprocessor's onOpen method is invoked to give it a chance to initialize.
Prior to this the parent would be informed of the impending split via an onSplit invocation,
and when it closes its onClose method would be called so it can clean up. How to manage the
split beyond this would be the problem of the coprocessor. 

Do we need both closing and pendingClose? [...]

I found that state transition in the master code and copied it verbatim from a comment block.
Actually coprocessors only go through three states: opening, open, closing. 

Why no control over flush?  Maybe it would want to hold up a flush?  You think that too dangerous?

I do think that is too dangerous. 

Rather, should we do the java Events model where one method gets all event types, the passed
in object says that the event is.  In the method, first thing you check if its an event you
are interested in?  Makes things easier to implement especially if you are only implementing
part of the functionality.  This model may not make sense though for this context or may be
overkill (See java.util.EventObject and some of its implementations).

I thought about that and go back and forth. Explicit interface is also self-documenting while
arcane gotchas can hide in event specific detail. There's also the notion of using ASM to
weave in policy enforcement. That could be easier if each callback is its own well defined
method. On the other hand there's a lot of foo() { super(); } crap for each callback that
a coprocessor does not care about. My current thinking is the later does not outweigh the

By the way, I am thinking about using ASM to weave in CPU and memory accounting and limit
enforcement as a generic code safety policy regardless.

Will Coprocessors make for lots of new object instantiations?  Its going to be invoked on
each Get and Scan.

Not unless the coprocessor does it. 

The logging interface seems odd.  Why have new define?  Why not just use apache logging?

The idea is no I/O outside of the interface is allowed. There will be an additional verification
step at classload time, implemented with ASM, that checks against a whitelist. Making the
whitelist to the extent possible a single interface is a simplifying choice.

Should we be extracting an Interface from Region so we can have a Region implemetention and
so your Coprocessor can have an implementation too?  We sort of did something like with the
"Incommon" interface we have for testing that has allows for implementations that run the
same tests only now against the Region and then against the client-side.  Extracting a 'official'
Region interface sounds grand to me... would help with testing?

That's a good idea. Should be a separate issue? 

How does the PrivateStore persist?  Where?  What you thinking?

One PrivateStore for each coprocessor would persist as an HFile+log in the region's store.
Would be cloned into daughters on split. Would get periodic compaction whenever the store
is compacted. The general idea is to do something less than manage a real table in a way that
hooks in naturally with store management. I gave it a table interface but it could be just
a bag of KVs if supporting multiple column families in a single HFile+log is too much trouble.

> Coprocessors: Colocate arbitrary code with regions
> --------------------------------------------------
>                 Key: HBASE-2001
>                 URL: https://issues.apache.org/jira/browse/HBASE-2001
>             Project: Hadoop HBase
>          Issue Type: Sub-task
>            Reporter: Andrew Purtell
>            Assignee: Andrew Purtell
>         Attachments: asm-3.2-bin.zip, asm-transformations.pdf, org.apache.hadoop.hbase.HCoprocessor.java,
> "Support arbitrary code that runs run next to each region in table. As regions split
and move, coprocessor code should automatically  move also."
> Use classloader which looks on HDFS.
> Associate a list of classes to load with each table. Put this in HRI so it inherits from
table but can be changed on a per region basis (so then those region specific changes can
inherited by daughters). 
> Not completely arbitrary code, should require implementation of an interface with callbacks
> * Open
> * Close
> * Split
> * Compact
> * (Multi)get and scanner next()
> * (Multi)put
> * (Multi)delete
> Add method to HRegionInterface for invoking coprocessor methods and retrieving results.
> Add methods in o.a.h.h.regionserver or subpackage which implement convenience functions
for coprocessor methods and consistent/controlled access to internals: store access, threading,
persistent and ephemeral state, scratch storage, etc. 

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message