hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs
Date Fri, 04 Jan 2013 22:14:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544294#comment-13544294
] 

Ted Yu commented on HBASE-3996:
-------------------------------

{code}
+@InterfaceAudience.Public
+public class MultiTableInputFormat extends MultiTableInputFormatBase implements
{code}
When audience is public, please add stability annotation.
w.r.t. enum Version, previously only HLogKey uses the enum. I think the enum would not be
in sync between HLogKey and TableSplit. See the following from HLogKey:
{code}
    COMPRESSED(-2);
{code}
So we can keep separate enum's for now.

Running TestMultiTableInputFormat, I saw several test failures.
{code}
  <testcase time="112.065" classname="org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat"
name="testScanEmptyToEmpty">
    <failure type="java.lang.AssertionError">java.lang.AssertionError
  at org.junit.Assert.fail(Assert.java:92)
  at org.junit.Assert.assertTrue(Assert.java:43)
  at org.junit.Assert.assertTrue(Assert.java:54)
  at org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat.testScan(TestMultiTableInputFormat.java:252)
  at org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat.testScanEmptyToEmpty(TestMultiTableInputFormat.java:177)
{code}
TestTableInputFormat passed locally.

Here is OS info:

Darwin TYus-MacBook-Pro.local 12.2.1 Darwin Kernel Version 12.2.1: Thu Oct 18 12:13:47 PDT
2012; root:xnu-2050.20.9~1/RELEASE_X86_64 x86_64
                
> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> ------------------------------------------------------------------------------
>
>                 Key: HBASE-3996
>                 URL: https://issues.apache.org/jira/browse/HBASE-3996
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Eran Kutner
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.5
>
>         Attachments: 3996-v10.txt, 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt,
3996-v6.txt, 3996-v7.txt, 3996-v8.txt, 3996-v9.txt, HBase-3996.patch
>
>
> It seems that in many cases feeding data from multiple tables or multiple scanners on
a single table can save a lot of time when running map/reduce jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message