hadoop-hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "schubert zhang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HIVE-705) Let Hive can analyse hbase's tables
Date Mon, 10 Aug 2009 17:26:15 GMT

    [ https://issues.apache.org/jira/browse/HIVE-705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12741431#action_12741431
] 

schubert zhang commented on HIVE-705:
-------------------------------------

Hi Samuel,

Thanks for your great job.
In you patch, we found many java files are modified, it is really a big effort. I don't know
if there is any way to avoid such a big modification.

Regards the schema mapping between HBase table and Hive SQL table, I have following consideration.
1. We just want to use HBase as a scalable structure data store, or key-value store.
2. The performance is not good when we maped SQL columns to HBase columns in our past experience.
For example, we have a table with 20 columns, then, each read or write of a row will comprise
20 key-value operations. It is ineffective.

How about consider more flexible schema mapping:
1. one HBase column can map to multiple hive-SQL columns with a SerDe. e.g.  cf1:q1 =>
{(col1, col2, col3), Default SerDe} 
2. one HBase column family can map to multiple hive-SQL columns with a SerDe. e.g. cf2: =>
{(col3, col5, col6), Default SerDe} 
3. your MAP column (in Hive table) for sparse column family. [Optional] Since Hive is a structured
data analysis front-end, we can omit this feature at the beginning.

For example:

CREATE EXTERNAL TABLE hive_table (pkey STRING,  col1 STRING, col2 INT, col2, STRING, col3
INT, col4 STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.MyHBaseSerDe'
WITH SERDEPROPERTIES (
"hbase.columns.mapping" = "cf1:(col1,col2,col3) with DefaultSerDe, cf2:c1 (col4) with DefaultSerDe",
)
STORED AS HBASETABLE
LOCATION '<hbase_table_location>'

Usually,  we want a more advanced data store backend than HDFS, to achieve more flexible data
placement and indexing. HBase's data model is very good to meet this requirement, but we may
need not the full fearures of HBase here.

--
Look forward to have more communication with you in Chinese, by your convenience.

Schubert

> Let Hive can analyse hbase's tables
> -----------------------------------
>
>                 Key: HIVE-705
>                 URL: https://issues.apache.org/jira/browse/HIVE-705
>             Project: Hadoop Hive
>          Issue Type: New Feature
>            Reporter: Samuel Guo
>         Attachments: hbase-0.19.3-test.jar, hbase-0.19.3.jar, HIVE-705_draft.patch
>
>
> Add a serde over the hbase's tables, so that hive can analyse the data stored in hbase
easily.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message