hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andrew Purtell (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5498) Secure Bulk Load
Date Thu, 01 Mar 2012 06:34:06 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13219838#comment-13219838
] 

Andrew Purtell commented on HBASE-5498:
---------------------------------------

The API gap was discussed on the mailing list but it didn't make it into a JIRA.

See http://search-hadoop.com/m/eEUHK1s4fo81/bulk+loading+and+RegionObservers

The salient detail:
{quote}
A simple and straightforward course of action is to give the CP the option of rewriting the
submitted store file(s) before the regionserver attempts to validate and move them into the
store. This is similar to how CPs are hooked into compaction: CPs hook compaction by allowing
one to wrap the scanner that is iterating over the store files. So the wrapper gets a chance
to examine the KeyValues being processed and also has an opportunity to modify or drop them.
 
Similarly for incoming HFiles for bulk load, the CP could be given a scanner iterating over
those files, if you had a RegionObserver installed. You would be given the option in effect
to rewrite the incoming HFiles before they are handed over to the RegionServer for addition
to the region.
{quote}

I think this is a reasonable approach to interface design, because the fact you are given
a scanner highlights the bulk nature of the input. However I think there should be two hooks
here: one that allows for a simple yes/no answer as to whether the bulk load should proceed;
and one that allows for a more expensive filtering or transformation or whatever via scanner-like
interface. Bulk loads could be potentially very large so requiring a scan over them always
is not a good idea.

Transferring ownership at the HDFS level can be done as suggested with a 'chown' enhancement
IMO. 
                
> Secure Bulk Load
> ----------------
>
>                 Key: HBASE-5498
>                 URL: https://issues.apache.org/jira/browse/HBASE-5498
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Francis Liu
>
> Design doc: https://cwiki.apache.org/confluence/display/HCATALOG/HBase+Secure+Bulk+Load
> Short summary:
> Security as it stands does not cover the bulkLoadHFiles() feature. Users calling this
method will bypass ACLs. Also loading is made more cumbersome in a secure setting because
of hdfs privileges. bulkLoadHFiles() moves the data from user's directory to the hbase directory,
which would require certain write access privileges set.
> Our solution is to create a coprocessor which makes use of AuthManager to verify if a
user has write access to the table. If so, launches a MR job as the hbase user to do the importing
(ie rewrite from text to hfiles). One tricky part this job will have to do is impersonate
the calling user when reading the input files. We can do this by expecting the user to pass
an hdfs delegation token as part of the secureBulkLoad() coprocessor call and extend an inputformat
to make use of that token. The output is written to a temporary directory accessible only
by hbase and then bulkloadHFiles() is called.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message