hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shahab Yunus <shahab.yu...@gmail.com>
Subject Re: Hbase MR Job with 2 OutputForm classes possible?
Date Thu, 31 Jul 2014 00:53:57 GMT
There is a trick. You can use MultipleOutputs with TableMapReduceUtil. In
the Reducer you can write to desired outputs on HDFS using MultipleOutputs
and the HBase Util will do its work as is.

Only caveat is that, you will have to commit the files that you have
written using MultipleOutputs yourself (which You can also do by extending
TableOutputFormat.)

Regards,
Shahab


On Wed, Jul 30, 2014 at 8:50 PM, Thomas Kwan <thomas.kwan@manage.com> wrote:

> Hi there,
>
> I have a Hbase MR job that reads data from HDFS, do a Hbase Get, and then
> do some data transformation. Then I need to put the data back  to Hbase as
> well as write data to a HDFS file directory (so I can import it back into
> Hive).
>
> The current job creation logic is similar to the following:
>
>     public static Job createHBaseJob(Configuration conf, String []args)
>     throws IOException {
>         Path inputDir = new Path(args[0]);
>         String tableName = args[1];
>         String params = args[2];
>
>         Job job = new Job(conf, NAME + "_" + tableName + " " + params);
>         job.setJarByClass(MyMap.class);
>         job.setInputFormatClass(TextInputFormat.class);
>         job.setMapperClass(MyMap.class);
>
>         FileInputFormat.setInputPaths(job, inputDir);
>
>         // No reducers.  Just write straight to table.  Call
> initTableReducerJob
>         // to set up the TableOutputFormat.
>         TableMapReduceUtil.initTableReducerJob(tableName, null, job);
>         job.setNumReduceTasks(0);
>
>         TableMapReduceUtil.addDependencyJars(job);
>         return job;
>     }
>
> TableMapReduceUtil.initTableReducerJob is already setup the OutputFormat
> class.  I wonder if there is magic that I can do to pipe the data to a HDFS
> file as well. Currently I just have 2 jobs. One writes to Hbase and one
> writes HDFS. But in the current setup, I need to do the Hbase get twice.
>
> Any input is highly welcome!!
>
> thomas
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message