Mailing-List: contact dev-help@hive.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@hive.apache.org
Date: Thu, 20 Jun 2013 02:16:20 +0000 (UTC)
From: "Navis (JIRA)" <jira@apache.org>
To: hive-dev@hadoop.apache.org
Message-ID: <JIRA.12653840.1371694474698.145805.1371694580322@arcas>
In-Reply-To: <JIRA.12653840.1371694474698@arcas>
References: <JIRA.12653840.1371694474698@arcas>
Subject: [jira] [Created] (HIVE-4765) Improve HBase bulk loading facility
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit

Navis created HIVE-4765:
---------------------------

             Summary: Improve HBase bulk loading facility
                 Key: HIVE-4765
                 URL: https://issues.apache.org/jira/browse/HIVE-4765
             Project: Hive
          Issue Type: Improvement
          Components: HBase Handler
            Reporter: Navis
            Assignee: Navis
            Priority: Minor


With some patches, bulk loading process for HBase could be simplified a lot.
{noformat}
CREATE EXTERNAL TABLE hbase_export(rowkey STRING, col1 STRING, col2 STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.hbase.HBaseExportSerDe'
WITH SERDEPROPERTIES ("hbase.columns.mapping" = ":key,cf1:key,cf2:value")
STORED AS
  INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat'
  OUTPUTFORMAT 'org.apache.hadoop.hive.hbase.HiveHFileExporter'
LOCATION '/tmp/export';

SET mapred.reduce.tasks=4;
set hive.optimize.sampling.orderby=true;

INSERT OVERWRITE TABLE hbase_export
SELECT * from (SELECT union_kv(key,key,value,":key,cf1:key,cf2:value") as (rowkey,union) FROM src) A ORDER BY rowkey,union;

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles /tmp/export test
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira