hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Basab Maulik (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-1634) Allow access to Primitive types stored in binary format in HBase
Date Fri, 22 Oct 2010 06:29:19 GMT

    [ https://issues.apache.org/jira/browse/HIVE-1634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923776#action_12923776
] 

Basab Maulik commented on HIVE-1634:
------------------------------------

Re: Beyond the review comments I added, I do have some higher-level suggestions:

    * For the column mapping, the reason I suggested "a:b:string" in the original JIRA description
is that it's a pain to keep everything lined up by column position. It's already less than
ideal that we do the column name mapping by position, so I don't think we should make it worse
by having a separate property for type. Using the s/b shorthand is fine, and if you think
that we shouldn't overload the colon, we can use a different separator, e.g. "cf:cq#s". Since
the existing property name is hbase.columns.mapping, I don't think it will be confusing to
roll in the (optional) type info as well.

I have adopted your suggestion of '#' as the separator to the storage information and use
'hbase.columns.mapping' to carry the additional storage information optionally. I have made
a small change to allow any prefix of 'string' or of 'binary' to be valid, i.e. s/b or str/bin
or string/binary etc.

    * I'm wondering whether we can just use the existing classes like LazyBinaryByte in package
org.apache.hadoop.hive.serde2.lazybinary instead of creating new ones. Or are these not compatible
with hbase.utils.Bytes?

I think the incompatibility stems more from trying to stay within the serde2.lazy.Lazy family
of objects which the HBaseSerDe, LazyHBaseRow, and LazyHBaseCellMap extend or depend on. It
will be useful to have these two families of classes compatible (inherit from a common base
class). Small differences in the object inspector classes which type parametrize these classes
further complicates getting past the type system. Should be doable but perhaps as a separate
patch?

    * For the tests, I noticed that you have attached TestHiveHBaseExternalTable. I think
it would be a good idea if you can create and populate such a fixture table in HBaseTestSetup;
that way it can be available (treated as read-only) to all of the HBase .q tests. Otherwise,
it's hard to verify that we're compatible with a table created directly through HBase API's
rather than Hive.

Done. Added tests to create a Hive external table associated with this HBase table and test
queries.

    * Also for the tests, it would be good if you can filter it down to only a small number
of representative rows when pulling the initial test data set from the Hive src table. That
way, we can keep the .q.out files smaller.

Done, the .out files are a lot smaller than in the initial patch.

    * Once we get this one committed, be sure to update the wiki.

Will do once this is committed.


> Allow access to Primitive types stored in binary format in HBase
> ----------------------------------------------------------------
>
>                 Key: HIVE-1634
>                 URL: https://issues.apache.org/jira/browse/HIVE-1634
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>    Affects Versions: 0.7.0
>            Reporter: Basab Maulik
>            Assignee: Basab Maulik
>         Attachments: HIVE-1634.0.patch, TestHiveHBaseExternalTable.java
>
>
> This addresses HIVE-1245 in part, for atomic or primitive types.
> The serde property "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" is a specification
of the storage option for the corresponding column in the serde property "hbase.columns.mapping".
Allowed values are '-' for table default, 's' for standard string storage, and 'b' for binary
storage as would be obtained from o.a.h.hbase.utils.Bytes. Map types for HBase column families
use a colon separated pair such as 's:b' for the key and value part specifiers respectively.
See the test cases and queries for HBase handler for additional examples.
> There is also a table property "hbase.table.default.storage.type" = "string" to specify
a table level default storage type. The other valid specification is "binary". The table level
default is overridden by a column level specification.
> This control is available for the boolean, tinyint, smallint, int, bigint, float, and
double primitive types. The attached patch also relaxes the mapping of map types to HBase
column families to allow any primitive type to be the map key.
> Attached is a program for creating a table and populating it in HBase. The external table
in Hive can access the data as shown in the example below.
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties ("hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double")
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.691 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	NULL	NULL	NULL	NULL	NULL	Test-String	NULL	NULL
> Time taken: 0.346 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.139 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,b,b,b" )
>     >  tblproperties (
>     >  "hbase.table.name" = "TestHiveHBaseExternalTable",
>     >  "hbase.table.default.storage.type" = "string");
> OK
> Time taken: 0.139 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.151 seconds
> hive> drop table TestHiveHBaseExternalTable;
> OK
> Time taken: 0.154 seconds
> hive> create external table TestHiveHBaseExternalTable
>     > (key string, c_bool boolean, c_byte tinyint, c_short smallint,
>     >  c_int int, c_long bigint, c_string string, c_float float, c_double double)
>     >  stored by 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
>     >  with serdeproperties (
>     >  "hbase.columns.mapping" = ":key,cf:boolean,cf:byte,cf:short,cf:int,cf:long,cf:string,cf:float,cf:double",
>     >  "hbase.columns.storage.types" = "-,b,b,b,b,b,-,b,b" )
>     >  tblproperties ("hbase.table.name" = "TestHiveHBaseExternalTable");
> OK
> Time taken: 0.347 seconds
> hive> select * from TestHiveHBaseExternalTable;
> OK
> key-1	true	-128	-32768	-2147483648	-9223372036854775808	Test-String	-2.1793132E-11	2.01345E291
> Time taken: 0.245 seconds
> hive> 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message