hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Francis Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-5498) Secure Bulk Load
Date Mon, 01 Oct 2012 18:15:09 GMT

    [ https://issues.apache.org/jira/browse/HBASE-5498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13467033#comment-13467033

Francis Liu commented on HBASE-5498:

"InternalBulkLoadListener" isn't necessary because there is no "BulkLoadListener" – just
call it BulkLoadListener?
I added internal to disambiguate as a listener to only the actual moving of the file and not
a listener to the entire bulkload process which is what the coprocessor hook does. I'm fine
either way was worried it'll be misunderstood.

The new '// TODO deal with viewFS' in HStore gives me concern. I think this should be implemented,
but don't have a strong opinion. There are other places where this is going to be an issue
I suspect.
My assumption was that HBase wasn't federation compatible yet. If that is true I think it's
safe to push this to that future effort.

In BaseRegionObserver we have "//TODO this should end up as a coprocessor hook" – Those
proposed hooks should be added as part of this change IMO. I don't like the idea of BaseRegionObserver
exporting something not part of the RegionObserver interface. It is supposed to be a default
implementation of that interface not a superset.
I didn't add this as a coprocessor hook as these methods are security only methods which we
don't want to bleed into the core code in 0.94. I added it as a TODO so we can address this
in 0.96 as part of streamlining things since we don't need have the artificial security separation
in that codebase?

In SecureBulkLoadEndpoint we have "//TODO make this configurable" – This should either be
done or not?
It is already configurable, I seem to have forgotten to remove the todo.

In SecureTestUtil, should we be loading the SecureBulkLoad support unconditionally? How about
just for the relevant tests?
Not sure what the downside would be? Since it is expected to always be enabled in a secure
deployment should it be always be available in the tests?

And maybe SecureBulkLoadProxy could be moved out of LoadIncrementalHFiles to a util class?
Perhaps others will want to programatically import HFiles securely. 
I added the proxy to prevent the security code bleeding into the core code. I extract this
as a helper class if you think it's useful? It seemed to me that LoadIncrementalHFiles was
the entry point for users that wanted to do bulk load as it does a lot of things that I believe
users wouldn't want to re-roll again.

> Secure Bulk Load
> ----------------
>                 Key: HBASE-5498
>                 URL: https://issues.apache.org/jira/browse/HBASE-5498
>             Project: HBase
>          Issue Type: Improvement
>          Components: security
>            Reporter: Francis Liu
>            Assignee: Francis Liu
>             Fix For: 0.94.3, 0.96.0
>         Attachments: HBASE-5498_94.patch, HBASE-5498_94.patch, HBASE-5498_draft_94.patch,
HBASE-5498_draft.patch, HBASE-5498_trunk.patch
> Design doc: https://cwiki.apache.org/confluence/display/HCATALOG/HBase+Secure+Bulk+Load
> Short summary:
> Security as it stands does not cover the bulkLoadHFiles() feature. Users calling this
method will bypass ACLs. Also loading is made more cumbersome in a secure setting because
of hdfs privileges. bulkLoadHFiles() moves the data from user's directory to the hbase directory,
which would require certain write access privileges set.
> Our solution is to create a coprocessor which makes use of AuthManager to verify if a
user has write access to the table. If so, launches a MR job as the hbase user to do the importing
(ie rewrite from text to hfiles). One tricky part this job will have to do is impersonate
the calling user when reading the input files. We can do this by expecting the user to pass
an hdfs delegation token as part of the secureBulkLoad() coprocessor call and extend an inputformat
to make use of that token. The output is written to a temporary directory accessible only
by hbase and then bulkloadHFiles() is called.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message