hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-13356) HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over snapshots
Date Sun, 26 Apr 2015 03:52:38 GMT

    [ https://issues.apache.org/jira/browse/HBASE-13356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14512859#comment-14512859

Hadoop QA commented on HBASE-13356:

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  against master branch at commit cd83d39fb4f50db901b699ba5470b5f709c95c69.
  ATTACHMENT ID: 12728190

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 12 new or modified

    {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions
(2.4.1 2.5.2 2.6.0)

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 protoc{color}.  The applied patch does not increase the total number of
protoc compiler warnings.

    {color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 4 warning messages.

                {color:red}-1 checkstyle{color}.  The applied patch generated 1965 checkstyle
errors (more than the master's current 1900 errors).

    {color:green}+1 findbugs{color}.  The patch does not introduce any  new Findbugs (version
2.0.3) warnings.

    {color:red}-1 release audit{color}.  The applied patch generated 7 release audit warnings
(more than the master's current 0 warnings).

    {color:red}-1 lineLengths{color}.  The patch introduces the following lines longer than
    + * MultiTableSnapshotInputFormat generalizes {@link org.apache.hadoop.hbase.mapred.TableSnapshotInputFormat}
+ * allowing a MapReduce job to run over one or more table snapshots, with one or more scans
configured for each.
+ * Internally, the input format delegates to {@link org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat}
+ * and thus has the same performance advantages; see {@link org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat}
+ * Usage is similar to TableSnapshotInputFormat, with the following exception: initMultiTableSnapshotMapperJob
takes in a map
+ * from snapshot name to a collection of scans. For each snapshot in the map, each corresponding
scan will be applied;
+ * the overall dataset for the job is defined by the concatenation of the regions and tables
included in each snapshot/scan
+ * {@link org.apache.hadoop.hbase.mapred.TableMapReduceUtil#initMultiTableSnapshotMapperJob(Map,
Class, Class, Class, JobConf, boolean, Path)}
+ * Internally, this input format restores each snapshot into a subdirectory of the given
tmp directory. Input splits and
+ * record readers are created as described in {@link org.apache.hadoop.hbase.mapreduce.TableSnapshotInputFormat}

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

    {color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/13816//testReport/
Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13816//artifact/patchprocess/patchReleaseAuditWarnings.txt
Release Findbugs (version 2.0.3) 	warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13816//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/13816//artifact/patchprocess/checkstyle-aggregate.html

                Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/13816//artifact/patchprocess/patchJavadocWarnings.txt
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/13816//console

This message is automatically generated.

> HBase should provide an InputFormat supporting multiple scans in mapreduce jobs over
> ----------------------------------------------------------------------------------------------
>                 Key: HBASE-13356
>                 URL: https://issues.apache.org/jira/browse/HBASE-13356
>             Project: HBase
>          Issue Type: New Feature
>          Components: mapreduce
>            Reporter: Andrew Mains
>            Assignee: Andrew Mains
>            Priority: Minor
>         Attachments: HBASE-13356.patch
> Currently, HBase supports the pushing of multiple scans to mapreduce jobs over live tables
(via MultiTableInputFormat) but only supports a single scan for mapreduce jobs over table
snapshots. It would be handy to support multiple scans over snapshots as well, probably through
another input format (MultiTableSnapshotInputFormat?). To mimic the functionality present
in MultiTableInputFormat, the new input format would likely have to take in the names of all
snapshots used in addition to the scans.

This message was sent by Atlassian JIRA

View raw message