hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ted Yu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-3996) Support multiple tables and scanners as input to the mapper in map/reduce jobs
Date Fri, 04 Jan 2013 22:14:13 GMT

    [ https://issues.apache.org/jira/browse/HBASE-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13544294#comment-13544294

Ted Yu commented on HBASE-3996:

+public class MultiTableInputFormat extends MultiTableInputFormatBase implements
When audience is public, please add stability annotation.
w.r.t. enum Version, previously only HLogKey uses the enum. I think the enum would not be
in sync between HLogKey and TableSplit. See the following from HLogKey:
So we can keep separate enum's for now.

Running TestMultiTableInputFormat, I saw several test failures.
  <testcase time="112.065" classname="org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat"
    <failure type="java.lang.AssertionError">java.lang.AssertionError
  at org.junit.Assert.fail(Assert.java:92)
  at org.junit.Assert.assertTrue(Assert.java:43)
  at org.junit.Assert.assertTrue(Assert.java:54)
  at org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat.testScan(TestMultiTableInputFormat.java:252)
  at org.apache.hadoop.hbase.mapreduce.TestMultiTableInputFormat.testScanEmptyToEmpty(TestMultiTableInputFormat.java:177)
TestTableInputFormat passed locally.

Here is OS info:

Darwin TYus-MacBook-Pro.local 12.2.1 Darwin Kernel Version 12.2.1: Thu Oct 18 12:13:47 PDT
2012; root:xnu-2050.20.9~1/RELEASE_X86_64 x86_64
> Support multiple tables and scanners as input to the mapper in map/reduce jobs
> ------------------------------------------------------------------------------
>                 Key: HBASE-3996
>                 URL: https://issues.apache.org/jira/browse/HBASE-3996
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Eran Kutner
>            Assignee: Lars Hofhansl
>             Fix For: 0.96.0, 0.94.5
>         Attachments: 3996-v10.txt, 3996-v2.txt, 3996-v3.txt, 3996-v4.txt, 3996-v5.txt,
3996-v6.txt, 3996-v7.txt, 3996-v8.txt, 3996-v9.txt, HBase-3996.patch
> It seems that in many cases feeding data from multiple tables or multiple scanners on
a single table can save a lot of time when running map/reduce jobs.
> I propose a new MultiTableInputFormat class that would allow doing this.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message