hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-18161) Incremental Load support for Multiple-Table HFileOutputFormat
Date Tue, 27 Jun 2017 06:12:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-18161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16064334#comment-16064334
] 

Hadoop QA commented on HBASE-18161:
-----------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
| {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 24s {color} | {color:blue}
Docker mode activated. {color} |
| {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} |
{color:green} Patch does not have any anti-patterns. {color} |
| {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green}
The patch does not contain any @author tags. {color} |
| {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color}
| {color:green} The patch appears to include 2 new or modified test files. {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 8m 15s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 50s {color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 36s {color}
| {color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s {color}
| {color:green} master passed {color} |
| {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 7m 6s {color} | {color:red}
hbase-server in master has 10 extant Findbugs warnings. {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 12s {color} |
{color:green} master passed {color} |
| {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 1m 53s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 53s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 53s {color} | {color:green}
the patch passed {color} |
| {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 42s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 32s {color}
| {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color}
| {color:green} The patch has no whitespace issues. {color} |
| {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 73m 38s {color}
| {color:green} Patch does not cause any errors with Hadoop 2.6.1 2.6.2 2.6.3 2.6.4 2.6.5
2.7.1 2.7.2 2.7.3 or 3.0.0-alpha3. {color} |
| {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 7m 44s {color} |
{color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s {color} |
{color:green} the patch passed {color} |
| {color:red}-1{color} | {color:red} unit {color} | {color:red} 220m 25s {color} | {color:red}
hbase-server in the patch failed. {color} |
| {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 1m 47s {color}
| {color:green} The patch does not generate ASF License warnings. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 332m 33s {color} | {color:black}
{color} |
\\
\\
|| Reason || Tests ||
| Failed junit tests | hadoop.hbase.regionserver.TestRegionReplicaFailover |
|   | hadoop.hbase.client.TestMobSnapshotCloneIndependence |
|   | hadoop.hbase.regionserver.TestEncryptionKeyRotation |
|   | hadoop.hbase.regionserver.TestPerColumnFamilyFlush |
|   | hadoop.hbase.security.access.TestCoprocessorWhitelistMasterObserver |
| Timed out junit tests | org.apache.hadoop.hbase.replication.regionserver.TestWALEntryStream
|
|   | org.apache.hadoop.hbase.client.TestFromClientSide3 |
|   | org.apache.hadoop.hbase.quotas.TestSpaceQuotas |
|   | org.apache.hadoop.hbase.client.TestFromClientSideWithCoprocessor |
|   | org.apache.hadoop.hbase.client.TestMobRestoreSnapshotFromClient |
|   | org.apache.hadoop.hbase.filter.TestFuzzyRowFilterEndToEnd |
\\
\\
|| Subsystem || Report/Notes ||
| Docker | Client=17.03.0-ce Server=17.03.0-ce Image:yetus/hbase:757bf37 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12874594/MultiHFileOutputFormatSupport_HBASE_18161_v11.patch
|
| JIRA Issue | HBASE-18161 |
| Optional Tests |  asflicense  javac  javadoc  unit  findbugs  hadoopcheck  hbaseanti  checkstyle
 compile  |
| uname | Linux af818bf1f967 4.8.3-std-1 #1 SMP Fri Oct 21 11:15:43 UTC 2016 x86_64 x86_64
x86_64 GNU/Linux |
| Build tool | maven |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/component/dev-support/hbase-personality.sh
|
| git revision | master / 35693f0 |
| Default Java | 1.8.0_131 |
| findbugs | v3.1.0-RC1 |
| findbugs | https://builds.apache.org/job/PreCommit-HBASE-Build/7346/artifact/patchprocess/branch-findbugs-hbase-server-warnings.html
|
| unit | https://builds.apache.org/job/PreCommit-HBASE-Build/7346/artifact/patchprocess/patch-unit-hbase-server.txt
|
| unit test logs |  https://builds.apache.org/job/PreCommit-HBASE-Build/7346/artifact/patchprocess/patch-unit-hbase-server.txt
|
|  Test Results | https://builds.apache.org/job/PreCommit-HBASE-Build/7346/testReport/ |
| modules | C: hbase-server U: hbase-server |
| Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/7346/console |
| Powered by | Apache Yetus 0.3.0   http://yetus.apache.org |


This message was automatically generated.



> Incremental Load support for Multiple-Table HFileOutputFormat
> -------------------------------------------------------------
>
>                 Key: HBASE-18161
>                 URL: https://issues.apache.org/jira/browse/HBASE-18161
>             Project: HBase
>          Issue Type: New Feature
>            Reporter: Densel Santhmayor
>            Priority: Minor
>         Attachments: MultiHFileOutputFormatSupport_HBASE_18161.patch, MultiHFileOutputFormatSupport_HBASE_18161_v10.patch,
MultiHFileOutputFormatSupport_HBASE_18161_v11.patch, MultiHFileOutputFormatSupport_HBASE_18161_v2.patch,
MultiHFileOutputFormatSupport_HBASE_18161_v3.patch, MultiHFileOutputFormatSupport_HBASE_18161_v4.patch,
MultiHFileOutputFormatSupport_HBASE_18161_v5.patch, MultiHFileOutputFormatSupport_HBASE_18161_v6.patch,
MultiHFileOutputFormatSupport_HBASE_18161_v7.patch, MultiHFileOutputFormatSupport_HBASE_18161_v8.patch,
MultiHFileOutputFormatSupport_HBASE_18161_v9.patch
>
>
> h2. Introduction
> MapReduce currently supports the ability to write HBase records in bulk to HFiles for
a single table. The file(s) can then be uploaded to the relevant RegionServers information
with reasonable latency. This feature is useful to make a large set of data available for
queries at the same time as well as provides a way to efficiently process very large input
into HBase without affecting query latencies.
> There is, however, no support to write variations of the same record key to HFiles belonging
to multiple HBase tables from within the same MapReduce job.  
> h2. Goal
> The goal of this JIRA is to extend HFileOutputFormat2 to support writing to HFiles for
different tables within the same MapReduce job while single-table HFile features backwards-compatible.

> For our use case, we needed to write a record key to a smaller HBase table for quicker
access, and the same record key with a date appended to a larger table for longer term storage
with chronological access. Each of these tables would have different TTL and other settings
to support their respective access patterns. We also needed to be able to bulk write records
to multiple tables with different subsets of very large input as efficiently as possible.
Rather than run the MapReduce job multiple times (one for each table or record structure),
it would be useful to be able to parse the input a single time and write to multiple tables
simultaneously.
> Additionally, we'd like to maintain backwards compatibility with the existing heavily-used
HFileOutputFormat2 interface to allow benefits such as locality sensitivity (that was introduced
long after we implemented support for multiple tables) to support both single table and multi
table hfile writes. 
> h2. Proposal
> * Backwards compatibility for existing single table support in HFileOutputFormat2 will
be maintained and in this case, mappers will need to emit the table rowkey as before. However,
a new class - MultiHFileOutputFormat - will provide a helper function to generate a rowkey
for mappers that prefixes the desired tablename to the existing rowkey as well as provides
configureIncrementalLoad support for multiple tables.
> * HFileOutputFormat2 will be updated in the following way:
> ** configureIncrementalLoad will now accept multiple table descriptor and region locator
pairs, analogous to the single pair currently accepted by HFileOutputFormat2. 
> ** Compression, Block Size, Bloom Type and Datablock settings PER column family that
are set in the Configuration object are now indexed and retrieved by tablename AND column
family
> ** getRegionStartKeys will now support multiple regionlocators and calculate split points
and therefore partitions collectively for all tables. Similarly, now the eventual number of
Reducers will be equal to the total number of partitions across all tables. 
> ** The RecordWriter class will be able to process rowkeys either with or without the
tablename prepended depending on how configureIncrementalLoad was configured with MultiHFileOutputFormat
or HFileOutputFormat2.
> * The use of MultiHFileOutputFormat will write the output into HFiles which will match
the output format of HFileOutputFormat2. However, while the default use case will keep the
existing directory structure with column family name as the directory and HFiles within that
directory, in the case of MultiHFileOutputFormat, it will output HFiles in the output directory
with the following relative paths: 
> {noformat}
>      --table1 
>        --family1 
>          --HFiles 
>      --table2 
>        --family1 
>        --family2 
>          --HFiles
> {noformat}
> This aims to be a comprehensive solution to the original tickets - HBASE-3727 and HBASE-16261.
Thanks to [~clayb] for his support. This is a contribution from Bloomberg developers.
> The patch will be attached shortly.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message