hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (Resolved) (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (HADOOP-8126) [Coprocessors] Add hooks for bulk loading actions
Date Thu, 01 Mar 2012 07:30:06 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-8126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Andrew Purtell resolved HADOOP-8126.

    Resolution: Invalid

Note to self: Don't try to open jiras on the iPhone.
> [Coprocessors] Add hooks for bulk loading actions
> -------------------------------------------------
>                 Key: HADOOP-8126
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8126
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Andrew Purtell
> The API gap for bulk HFile loading was discussed on the mailing list but it didn't make
it into a JIRA. It also came up on HBASE-5498. 
> See http://search-hadoop.com/m/eEUHK1s4fo81/bulk+loading+and+RegionObservers
> The salient detail:
> {quote}
> A simple and straightforward course of action is to give the CP the option of rewriting
the submitted store file(s) before the regionserver attempts to validate and move them into
the store. This is similar to how CPs are hooked into compaction: CPs hook compaction by allowing
one to wrap the scanner that is iterating over the store files. So the wrapper gets a chance
to examine the KeyValues being processed and also has an opportunity to modify or drop them.
> Similarly for incoming HFiles for bulk load, the CP could be given a scanner iterating
over those files, if you had a RegionObserver installed. You would be given the option in
effect to rewrite the incoming HFiles before they are handed over to the RegionServer for
addition to the region.
> {quote}
> I think this is a reasonable approach to interface design, because the fact you are given
a scanner highlights the bulk nature of the input. However I think there should be two hooks
here: one that allows for a simple yes/no answer as to whether the bulk load should proceed;
and one that allows for a more expensive filtering or transformation or whatever via scanner-like
interface. Bulk loads could be potentially very large so requiring a scan over them always
is not a good idea.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message