Return-Path: X-Original-To: apmail-hive-user-archive@www.apache.org Delivered-To: apmail-hive-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 45574E33E for ; Thu, 6 Dec 2012 18:56:57 +0000 (UTC) Received: (qmail 44469 invoked by uid 500); 6 Dec 2012 18:56:55 -0000 Delivered-To: apmail-hive-user-archive@hive.apache.org Received: (qmail 44418 invoked by uid 500); 6 Dec 2012 18:56:55 -0000 Mailing-List: contact user-help@hive.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hive.apache.org Delivered-To: mailing list user@hive.apache.org Received: (qmail 44410 invoked by uid 99); 6 Dec 2012 18:56:55 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 18:56:55 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of ogdude@googlemail.com designates 209.85.217.176 as permitted sender) Received: from [209.85.217.176] (HELO mail-lb0-f176.google.com) (209.85.217.176) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Dec 2012 18:56:48 +0000 Received: by mail-lb0-f176.google.com with SMTP id k6so6007631lbo.35 for ; Thu, 06 Dec 2012 10:56:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=aStd1HiZdH5GeLv9F3K89uwGMyvxyn9ZopiEJbzVmNM=; b=KyJ6abo3IFMTw1E8qWs0ZbZn1NvyI+kg4F4BRj1qPfKHN3rjZsEq497RNYyxxNgYLN IT7kaOTvTgX2guljwXXtIepqqXgn1vP3bFeWtMCtvubuu8BYEfoEEZ7VBXG6oLvfp9f5 zQFHdFqLoKzLJAnpgNM31tse1T3uJnHAm/eRhn6xBUxOv9KEnFT+nlW2JooeWVartZEt x497hVkKpCf3gXTGs7b/wDEu0kAilpCjsgxas3JicNnIuGnkzNIkP5wyUCTSrfcG0Sf7 DEOuV7my/zR9hDx04BvCbw2R4PJS6fpyJur4gb3lanqB2neuazuem8ZC7NBB8LPwItbg VpfA== MIME-Version: 1.0 Received: by 10.152.124.15 with SMTP id me15mr81060lab.5.1354820187184; Thu, 06 Dec 2012 10:56:27 -0800 (PST) Received: by 10.112.127.6 with HTTP; Thu, 6 Dec 2012 10:56:27 -0800 (PST) Date: Thu, 6 Dec 2012 19:56:27 +0100 Message-ID: Subject: Mapping existing HBase table with many columns to Hive. From: David Koch To: user@hive.apache.org Content-Type: multipart/alternative; boundary=f46d04374581fdddca04d033a77e X-Virus-Checked: Checked by ClamAV on apache.org --f46d04374581fdddca04d033a77e Content-Type: text/plain; charset=ISO-8859-1 Hello, How can I map an HBase table with the following layout to Hive using the "CREATE EXTERNAL TABLE" command from shell (or another programmatic way): The HBase table's layout is as follows: Rowkey=16 bytes, a UUID that had the "-" removed, and the 32hex chars converted into two 8byte longs. Columns (qualifiers): timestamps, i.e the bytes of a long which were converted using Hadoop's Bytes.toBytes(long). There can be many of those in a single row. Values: The bytes of a Java string. I am unsure of which datatypes to use. I am pretty sure there is no way I can sensible map the row key to anything other than "binary" but maybe the columns - which are longs and the values which are strings can be mapped to their according Hive datatypes. I include an extract of what a row looks like in HBase shell below: Thank you, /David hbase(main):009:0> scan "hits" ROW COLUMN+CELL \x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC column=t:\x00\x00\x01;2\xE6Q\x06, timestamp=1267737987733, value=blahaha \x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC column=t:\x00\x00\x01;2\xE6\xFB@, timestamp=1354012104967, value=testtest --f46d04374581fdddca04d033a77e Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Hello,

How can I map an HBase table with the following l= ayout to Hive using the "CREATE EXTERNAL TABLE" command from shel= l (or another programmatic way):

The HBase table&#= 39;s layout is as follows:
Rowkey=3D16 bytes, a UUID that had the "-" removed, and the = 32hex chars converted into two 8byte longs.
Columns (qualifiers):= timestamps, i.e the bytes of a long which were converted using Hadoop'= s Bytes.toBytes(long). There can be many of those in a single row.
Values: The bytes of a Java string.

I am unsu= re of which datatypes to use. I am pretty sure there is no way I can sensib= le map the row key to anything other than "binary" but maybe the = columns - which are longs and the values which are strings can be mapped to= their according Hive datatypes.

I include an extract of what a row looks like in HBase = shell below:

Thank you,

/= David

hbase(main):009:0> scan "hits&q= uot;
ROW =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 COLUMN+CELL =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0
\x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC =A0column= =3Dt:\x00\x00\x01;2\xE6Q\x06, timestamp=3D1267737987733, value=3Dblahaha
\x00\x00\x06\xB1H\x89N\xC3\xA5\x83\x0F\xDD\x1E\xAE&\xDC = =A0column=3Dt:\x00\x00\x01;2\xE6\xFB@, timestamp=3D1354012104967, value=3Dt= esttest =A0
--f46d04374581fdddca04d033a77e--