hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-12590) A solution for data skew in HBase-Mapreduce Job
Date Wed, 03 Dec 2014 13:01:12 GMT

    [ https://issues.apache.org/jira/browse/HBASE-12590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14232972#comment-14232972
] 

Hadoop QA commented on HBASE-12590:
-----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12684872/HBase-12590-v2.patch
  against master branch at commit 13a1eaec09a467153adc1ee0b46df9f457da6115.
  ATTACHMENT ID: 12684872

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 6 new or modified
tests.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 checkstyle{color}.  The applied patch does not increase the total number
of checkstyle errors

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
2.0.3) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:green}+1 lineLengths{color}.  The patch does not introduce lines longer than 100

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
     

     {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s): 	at org.apache.hadoop.hbase.master.balancer.TestDefaultLoadBalancer.testBalanceCluster(TestDefaultLoadBalancer.java:119)

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html
Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//artifact/patchprocess/checkstyle-aggregate.html

  Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11927//console

This message is automatically generated.

> A solution for data skew in HBase-Mapreduce Job
> -----------------------------------------------
>
>                 Key: HBASE-12590
>                 URL: https://issues.apache.org/jira/browse/HBASE-12590
>             Project: HBase
>          Issue Type: Improvement
>          Components: mapreduce
>            Reporter: Weichen Ye
>         Attachments: A Solution for Data Skew in HBase-MapReduce Job (Version2).pdf,
HBase-12590-v1.patch, HBase-12590-v2.patch
>
>
> 1, Motivation
> In production environment, data skew is a very common case. A HBase table always contains
a lot of small regions and several large regions. Small regions waste a lot of computing resources.
If we use a job to scan a table with 3000 small regions, we need a job with 3000 mappers.
Large regions always block the job. If in a 100-region table, one region is far larger then
the other 99 regions. When we run a job with the table as input, 99 mappers will be completed
very quickly, and we need to wait for the last mapper for a long time.
> 2, Configuration
> Add two new configuration. 
> hbase.mapreduce.split.autobalance = true means enabling the “auto balance” in HBase-MapReduce
jobs. The default value is false. 
> hbase.mapreduce.split.targetsize = 1073741824 (default 1GB). The target size of mapreduce
splits. 
> If a region size is large than the target size, cut the region into two split.If the
sum of several small continuous region size less than the target size, combine these regions
into one split.
> Example:
> In attachment
> Welcome to the Review Board.
> https://reviews.apache.org/r/28494/diff/#



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message