hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Samuel Guo (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables
Date Sun, 23 Aug 2009 12:20:59 GMT

    [ https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746592#action_12746592
] 

Samuel Guo commented on HIVE-705:
---------------------------------

Attach a new patch.

1) move the related hbase code to the contrib package, as hbase just an optional storage for
hive, not neccessary.
I have tried to avoid modifying the hive original code and just add a hbase serde to connect
hive with hbase. But the hbase storage model is quite different with file storage model. For
example, a loadwork is used to rename/copy files from temp dir to the target table's dir if
a query's target is a hive table. But in a hbased hive table, we can't rename a table now.
So it's hard to let a hbased hive table to follow the logic of a normal file-based hive table.
 So I add some code(HiveFormatUtils) to distinguish a file-based table from a not-file-based
table.

2) fix some bugs in the draft patch, such as "select *" return nothing.

----------------------------------------------------------------------------------------------

How to use the hbase as hive's storage?

1) remember to add the contrib jar and the hbase jar in the hive's auxPath, so m/r can populate
the neccessary hbase-related jars to the whole hadoop m/r cluster.

> $HIVE_HOME/bin/hive -auxPath ${contrib_jar},${hbase_jar}

2) modify the configuration to add the following configuration parameters.

"hbase.master" : pointer to the hbase's master.
"hive.othermetadata.handlers" : "org.apache.hadoop.hive.contrib.hbase.HiveHBaseTableInputFormat:org.apache.hadoop.hive.contrib.hbase.HBaseMetadataHandler"

"hive.othermetadata.handlers" collects the metadata handlers to handle the other metadata
operations in the not-file-based hive tables. Take hbase as an example. HBaseMetadataHandler
will create the neccessary hbase table and its family columns when we create a hbased hive
table from hive's client. It also drop the hbase table when we drop the hive table.

The metastore read the registered handlers map from the configuration file during initialization.
The registered handlers map is formated as "table_format_classname:table_metadata_handler_classname,table_format_classname:table_metadata_handler_classname,...".

3) enjoy "hive over hbase"!

------------------------------------------------------------------------

Other problems.

1) Altering a hased-hive table is not supported now. :(
renaming a table in hbase is not supported now, so I just do not support rename operation.
( maybe if we rename a hive table, we do not need to rename the base hbase table.)

adding/replacing cloumns.
Now we need to specify the schema mapping in the SerDe properties explicitly. If we want to
adding columns, we need to call 'alter' twice to adding columns: change the serde properties
and the hive columns.  Either change the serde properties first or change the hive columns
first will fail now, because we validate the schema mapping during SerDe initialization. One
of the hbase serde validation is to check the counts of hive columns and hbase mapping columns.
If we first change the hive columns, the number of hive columns will be more than hbase mapping
columns, the HBase Serde initialization will fail this alter operation.  (maybe we need to
remove the validation code from HBaseSerDe initialization and do it in other place?)

2) more flexible schema mapping?
As Schubert metioned before, more flexible schema mapping will be useful for user. This feature
will be added later.


welcome for comments~




> Let Hive can analyse hbase's tables
> -----------------------------------
>
>                 Key: HIVE-705
>                 URL: https://issues.apache.org/jira/browse/HIVE-705
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Samuel Guo
>         Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, HIVE-705_draft.patch, HIVE-705_revision806905.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored in hbase
easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message