hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jean-Daniel Cryans <jdcry...@apache.org>
Subject Re: Reduce-side-join, input from hbase and hdfs
Date Mon, 17 Oct 2011 17:34:39 GMT
You cannot have 2 input formats, so at this point you need to write your own
input format that is both an input format for HDFS files and HBase.

Currently there's no MultipleTableInputFormat, although it wouldn't solve
your problem because it won't take HDFS inputs.

Your other option sounds right, although slower as you mentioned.

J-D

On Sun, Oct 16, 2011 at 2:48 AM, Christopher Dorner <
christopher.dorner@gmail.com> wrote:

> Hi,
>
> I am considering doing Reduce-Side-Joins, where one input would be read
> from HDFS and another one from a HBase Table.
>
> is it somehow possible to use
>
> TableMapReduceUtil.**initTableMapperJob(table, scan, Mapper_HBase.class,
> ..., job);
>
> and
>
> MultipleInputs(job, path, ..., Mapper_HDFS.class)
>
> in the same time for one job?
> It seems, MultipleInputs(...) gets the priority when i tried to use both.
> The Mapper_HBase was not executed. It executes, when i remove the
> MultipleInputs.
>
>
> And is there something equivalent to MultipleInputs() for HBase Tables?
> e.g. MultipleTableInputs()? I saw there was a request here
> https://issues.apache.org/**jira/browse/HBASE-2965<https://issues.apache.org/jira/browse/HBASE-2965>
>
>
> A workaround would be to write the Scan Results to HDFS first and do the
> reduce-side join by using MultipleInputs. But i wanted to avoid this
> additional I/O overhead.
>
> Thanks,
> Christopher
>
>
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message