hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-8073) HFileOutputFormat support for offline operation
Date Sat, 31 May 2014 04:58:02 GMT

    [ https://issues.apache.org/jira/browse/HBASE-8073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14014526#comment-14014526
] 

Hadoop QA commented on HBASE-8073:
----------------------------------

{color:red}-1 overall{color}.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12647743/HBASE-8073-trunk-v0.patch
  against trunk revision .
  ATTACHMENT ID: 12647743

    {color:green}+1 @author{color}.  The patch does not contain any @author tags.

    {color:green}+1 tests included{color}.  The patch appears to include 6 new or modified
tests.

    {color:green}+1 javac{color}.  The applied patch does not increase the total number of
javac compiler warnings.

    {color:green}+1 javadoc{color}.  The javadoc tool did not generate any warning messages.

    {color:green}+1 findbugs{color}.  The patch does not introduce any new Findbugs (version
1.3.9) warnings.

    {color:green}+1 release audit{color}.  The applied patch does not increase the total number
of release audit warnings.

    {color:red}-1 lineLengths{color}.  The patch introduces the following lines longer than
100:
    +  static void configureCompression(HTableDescriptor tableDesc, Configuration conf) throws
IOException {
+  static void configureBlockSize(HTableDescriptor tableDesc, Configuration conf) throws IOException
{
+  static void configureBloomType(HTableDescriptor tableDesc, Configuration conf) throws IOException
{

  {color:green}+1 site{color}.  The mvn site goal succeeds with this patch.

     {color:red}-1 core tests{color}.  The patch failed these unit tests:
                       org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2
                  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9657//testReport/
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9657//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9657//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9657//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9657//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9657//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9657//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9657//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9657//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html
Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9657//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9657//console

This message is automatically generated.

> HFileOutputFormat support for offline operation
> -----------------------------------------------
>
>                 Key: HBASE-8073
>                 URL: https://issues.apache.org/jira/browse/HBASE-8073
>             Project: HBase
>          Issue Type: Sub-task
>          Components: mapreduce
>            Reporter: Nick Dimiduk
>             Fix For: 0.99.0
>
>         Attachments: HBASE-8073-trunk-v0.patch
>
>
> When using HFileOutputFormat to generate HFiles, it inspects the region topology of the
target table. The split points from that table are used to guide the TotalOrderPartitioner.
If the target table does not exist, it is first created. This imposes an unnecessary dependence
on an online HBase and existing table.
> If the table exists, it can be used. However, the job can be smarter. For example, if
there's far more data going into the HFiles than the table currently contains, the table regions
aren't very useful for data split points. Instead, the input data can be sampled to produce
split points more meaningful to the dataset. LoadIncrementalHFiles is already capable of handling
divergence between HFile boundaries and table regions, so this should not pose any additional
burdon at load time.
> The proper method of sampling the data likely requires a custom input format and an additional
map-reduce job perform the sampling. See a relevant implementation: https://github.com/alexholmes/hadoop-book/blob/master/src/main/java/com/manning/hip/ch4/sampler/ReservoirSamplerInputFormat.java



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message