hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Akash Ashok (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HBASE-2965) Implement MultipleTableInputs which is analogous to MultipleInputs in Hadoop
Date Sat, 05 Feb 2011 02:55:30 GMT

    [ https://issues.apache.org/jira/browse/HBASE-2965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12990891#comment-12990891
] 

Akash Ashok commented on HBASE-2965:
------------------------------------

Hi, I was lookin for the exact same thing. As we are moving from just processing on hadoop
to using Hbase, we are in dire need of this MultipleTableInputs for our reduce side joins.
Could some1 please temme as to when this will be implemented .

Also can I move this feature from Minor to Major as this is a very important feature?

> Implement MultipleTableInputs which is analogous to MultipleInputs in Hadoop
> ----------------------------------------------------------------------------
>
>                 Key: HBASE-2965
>                 URL: https://issues.apache.org/jira/browse/HBASE-2965
>             Project: HBase
>          Issue Type: New Feature
>          Components: mapred, mapreduce
>            Reporter: Adam Warrington
>            Priority: Minor
>
> This feature would be helpful for doing reduce side joins, or even passing similarly
structured data from multiple tables through map reduce. The API I envision would be very
similar to the already existent MultipleInputs, parts of which could be reused.
> MultipleTableInputs would have a public api like:
> class MultipleTableInputs {
>   public static void addInputTable(Job job, Table table, Scan scan, Class<? extends
TableInputFormatBase> inputFormatClass, Class<? extends Mapper> mapperClass);
> };
> MultipleTableInputs would build a mapping of Tables to configured TableInputFormats the
same way MultipleInputs builds a mapping between Paths and InputFormats. Since most people
will probably use TableInputFormat.class as the input format class, the MultipleTableInput
implementation will have to replace the TableInputFormatBase's private scan and table members
that are configured when an instance of TableInputFormat is created (from within its setConf()
method) by calling setScan and setHTable with the table and scan that are passed into addInputTable
above. MultipleTableInputFormat's addInputTable() member function would also set the input
format for the job to DelegatingTableInputFormat, described below.
> A new class called DelegatingTableInputFormat would be analogous to DelegatingInputFormat,
where getSplits() would return TaggedInputSplits (same TaggedInputSplit object that the Hadoop
DelegatingInputFormat uses), which tag the split with its InputFormat and Mapper. These are
created by looping through the HTable to InputFormat mappings, and calling getSplits on each
input format, and using the split, the input format, and mapper as constructor args to TaggedInputSplits.
> The createRecordReader() function in DelegatingTableInputFormat could have the same implementation
as the Hadoop DelegatingInputFormat.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message