Return-Path: X-Original-To: apmail-hbase-issues-archive@www.apache.org Delivered-To: apmail-hbase-issues-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8A2961058D for ; Mon, 20 Jan 2014 22:32:24 +0000 (UTC) Received: (qmail 12537 invoked by uid 500); 20 Jan 2014 22:32:23 -0000 Delivered-To: apmail-hbase-issues-archive@hbase.apache.org Received: (qmail 12344 invoked by uid 500); 20 Jan 2014 22:32:23 -0000 Mailing-List: contact issues-help@hbase.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@hbase.apache.org Received: (qmail 12232 invoked by uid 99); 20 Jan 2014 22:32:22 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 20 Jan 2014 22:32:22 +0000 Date: Mon, 20 Jan 2014 22:32:22 +0000 (UTC) From: "Ted Yu (JIRA)" To: issues@hbase.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HBASE-10323) Auto detect data block encoding in HFileOutputFormat MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HBASE-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-10323: --------------------------- Fix Version/s: 0.98.0 Hadoop Flags: Reviewed Integrated to 0.98 as well. > Auto detect data block encoding in HFileOutputFormat > ---------------------------------------------------- > > Key: HBASE-10323 > URL: https://issues.apache.org/jira/browse/HBASE-10323 > Project: HBase > Issue Type: Improvement > Components: mapreduce > Reporter: Ishan Chhabra > Assignee: Ishan Chhabra > Fix For: 0.98.0, 0.99.0 > > Attachments: HBASE_10323-0.94.15-v1.patch, HBASE_10323-0.94.15-v2.patch, HBASE_10323-0.94.15-v3.patch, HBASE_10323-0.94.15-v4.patch, HBASE_10323-0.94.15-v5.patch, HBASE_10323-trunk-v1.patch, HBASE_10323-trunk-v2.patch, HBASE_10323-trunk-v3.patch, HBASE_10323-trunk-v4.patch > > > Currently, one has to specify the data block encoding of the table explicitly using the config parameter "hbase.mapreduce.hfileoutputformat.datablock.encoding" when doing a bulkload load. This option is easily missed, not documented and also works differently than compression, block size and bloom filter type, which are auto detected. > The solution would be to add support to auto detect datablock encoding similar to other parameters. > The current patch does the following: > 1. Automatically detects datablock encoding in HFileOutputFormat. > 2. Keeps the legacy option of manually specifying the datablock encoding > around as a method to override auto detections. > 3. Moves string conf parsing to the start of the program so that it fails > fast during starting up instead of failing during record writes. It also > makes the internals of the program type safe. > 4. Adds missing doc strings and unit tests for code serializing and > deserializing config paramerters for bloom filer type, block size and > datablock encoding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)