crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Josh Wills (JIRA)" <>
Subject [jira] [Updated] (CRUNCH-536) crunch jobs fail to use hbase api of secured hbase
Date Mon, 06 Jul 2015 18:53:06 GMT


Josh Wills updated CRUNCH-536:
    Attachment: CRUNCH-536b.patch

Here's my take on the above.

> crunch jobs fail to use hbase api of secured hbase
> --------------------------------------------------
>                 Key: CRUNCH-536
>                 URL:
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Jan Van Besien
>         Attachments: CRUNCH-536.patch, CRUNCH-536b.patch
> When accessing a secured hbase from within a mapreduce job, it is required that the hbase
credentials are initialized on the job before it was submitted. This can be done with TableMapReduceUtil.initCredentials(job).
> In case the job is the consequence of using HBaseSourceTarget, crunch-hbase can take
care of it, see CRUNCH-535.
> However, it is also possible to write DoFn's that use the HBase api directly, without
using hbase input/output format. As an example use case, consider a job that bulk writes data
to hbase by writing HFiles on HDFS which are later to be loaded into HBase. Such a job doesn't
read or write from/to hbase using an input/output format directly, but it might still require
access to other tables in HBase, for example auxiliary tables with metadata specific to the
> We can of course not expect crunch-core to call initCredentials (which is HBase specific)
on all jobs, just in case, but it would be nice to be able to register a callback on the MRPipeline
which is applied to every job before it is submitted, to cover this use case.
> I will provide a patch which will help to explain what I am suggesting here.

This message was sent by Atlassian JIRA

View raw message