hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "stack (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs
Date Wed, 22 Jun 2011 04:33:47 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13053039#comment-13053039
] 

stack commented on HBASE-3996:
------------------------------

FYI, patch has bunch of tabs in it instead of two spaces for tabs and some lines > 80 chars
but no biggie -- I can fix that on commit.  Here's a few comments.

In TableSplit you create an HTable instance.  Do you need to?  And when you create it, though
I believe it will be less of a problem going forward, can you use the constructor that takes
a Configuration and table name?  Is there a close in Split interface?  If so, you might want
to call close of your HTable in there.  (Where is it used?  Each split needs its own HTable?)
 Use the constructor that takes a Configuration here too... +    HTable table = new HTable(tic.getTableName());$

You don't need the e.printStackTrace in below

{code}
+    Log.warn("Failed to convert Scan to Strting", e);$
+    e.printStackTrace();$
{code}

Nice javadoc.

By any chance is the code here in MultiTableInputFormatBase where we are checking start and
end rows copied from elsewhere? 

Otherwise patch looks great.  Test too.


The line above it will output the stack trace (spelling too!).

You remove the hashCode in TableSplit.  Should it have one?



> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3996
>                 URL: https://issues.apache.org/jira/browse/HBASE-3996
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Eran Kutner
>             Fix For: 0.90.4
>
>         Attachments: MultiTableInputFormat.patch, TestMultiTableInputFormat.java.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple scanners on
a single table can save a lot of time when running map/reduce jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message