crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jan Van Besien (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (CRUNCH-536) crunch jobs fail to use hbase api of secured hbase
Date Thu, 02 Jul 2015 13:27:04 GMT

     [ https://issues.apache.org/jira/browse/CRUNCH-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jan Van Besien updated CRUNCH-536:
----------------------------------
    Description: 
When accessing a secured hbase from within a mapreduce job, it is required that the hbase
credentials are initialized on the job before it was submitted. This can be done with TableMapReduceUtil.initCredentials(job).

In case the job is the consequence of using HBaseSourceTarget, crunch-hbase can take care
of it, see CRUNCH-535.

However, it is also possible to write DoFn's that use the HBase api directly, without using
hbase input/output format. As an example use case, consider a job that bulk writes data to
hbase by writing HFiles on HDFS which are later to be loaded into HBase. Such a job doesn't
read or write from/to hbase using an input/output format directly, but it might still require
access to other tables in HBase, for example auxiliary tables with metadata specific to the
application. 

We can of course not expect crunch-core to call initCredentials (which is HBase specific)
on all jobs, just in case, but it would be nice to be able to register a callback on the MRPipeline
which is applied to every job before it is submitted, to cover this use case.

I will provide a patch which will help to explain what I am suggesting here.

  was:
When accessing a secured hbase from within a mapreduce job, it is required that the hbase
credentials were initialized on the job before it was submitted. This can be done with TableMapReduceUtil.initCredentials(job).

In case the job is the consequence of using HBaseSourceTarget, crunch-hbase can take care
of it, see CRUNCH-535.

However, it is also possible to write DoFn's that use the HBase api directly, without using
hbase input/output format. As an example use case, consider a job that bulk writes data to
hbase by writing HFiles on HDFS which are later to be loaded into HBase. Such a job doesn't
read or write from/to hbase using an input/output format directly, but it might still require
access to other tables in HBase, for example auxiliary tables with metadata specific to the
application. 

We can of course not expect crunch-core to call initCredentials (which is HBase specific)
on all jobs, just in case, but it would be nice to be able to register a callback on the MRPipeline
which is applied to every job before it is submitted, to cover this use case.

I will provide a patch which will help to explain what I am suggesting here.


> crunch jobs fail to use hbase api of secured hbase
> --------------------------------------------------
>
>                 Key: CRUNCH-536
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-536
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Jan Van Besien
>         Attachments: CRUNCH-536.patch
>
>
> When accessing a secured hbase from within a mapreduce job, it is required that the hbase
credentials are initialized on the job before it was submitted. This can be done with TableMapReduceUtil.initCredentials(job).
> In case the job is the consequence of using HBaseSourceTarget, crunch-hbase can take
care of it, see CRUNCH-535.
> However, it is also possible to write DoFn's that use the HBase api directly, without
using hbase input/output format. As an example use case, consider a job that bulk writes data
to hbase by writing HFiles on HDFS which are later to be loaded into HBase. Such a job doesn't
read or write from/to hbase using an input/output format directly, but it might still require
access to other tables in HBase, for example auxiliary tables with metadata specific to the
application. 
> We can of course not expect crunch-core to call initCredentials (which is HBase specific)
on all jobs, just in case, but it would be nice to be able to register a callback on the MRPipeline
which is applied to every job before it is submitted, to cover this use case.
> I will provide a patch which will help to explain what I am suggesting here.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message