hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ning Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HIVE-7207) support partial rowkey scan in HBase filter pushdown
Date Tue, 10 Jun 2014 17:29:01 GMT

     [ https://issues.apache.org/jira/browse/HIVE-7207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ning Zhang updated HIVE-7207:
-----------------------------

    Description: 
One of our Hive tables is backed up by Hbase (HBaseStorageHandler), to simulate the partitioned
Hive Table by "DataDate", we use composite rowkey in Hbase, e.g. DataDate_Userid_Actionid_Timestamp.
The example rowkey is as follow.

rowkey:
20140601_784353454593233274_20123282_1401632522132
20140601_784353454_20123282_1401632522132
20140601_784470763593179377_20485247_1401632520825
20140601_784470763593233227_20485222_1401632520821

However, it seems Hive does not support "partial rowkey scan". For example I want to get all
data that were generated on 06/01/2014, so I issue the following Hive query, but Hive returns
nothing.

select * from table where DataDate="20140601";

After several attempts, I found that I have to give exact row key (e.g. 20140601_784353454_20123282_1401632522132)
so that Hive can find that record.

The reason I want to see the "partial rowkey scan" feature is because: in Hbase, partial table
scan should have better performance than full table scan.

Is there any plan in Hive community to support "partial rowkey scan" in near future?

  was:
One of our Hive tables is backed up by Hbase (HBaseStorageHandler), to simulate the partitioned
Hive Table by "DataDate", we use composite rowkey in Hbase, e.g. DataDate_Userid_Actionid_Timestamp.
The example rowkey is as follow.

rowkey:
20140601_784353454593233274_20123282_1401632522132
20140601_784353454_20123282_1401632522132
20140601_784470763593179377_20485247_1401632520825
20140601_784470763593233227_20485222_1401632520821

However, it seems Hive does not support "partial rowkey scan". For example I want to get all
data that were generated on 06/01/2014, so I issue the following Hive query, but Hive returns
nothing.

select * from table where DataDate="20140601";

After several attempts, I found that I have to give exact row key (e.g. 20140601_784353454_20123282_1401632522132)
so that Hive can find that record.

The reason I want to see the "partial rowkey scan" feature is because: in Hbase, partial table
scan should have better performance than full table scan.

Given Is there any plan in Hive community to support "partial rowkey scan" in near future?


> support partial rowkey scan in HBase filter pushdown
> ----------------------------------------------------
>
>                 Key: HIVE-7207
>                 URL: https://issues.apache.org/jira/browse/HIVE-7207
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Ning Zhang
>            Priority: Minor
>              Labels: Hbase, HbaseStorageHandler
>
> One of our Hive tables is backed up by Hbase (HBaseStorageHandler), to simulate the partitioned
Hive Table by "DataDate", we use composite rowkey in Hbase, e.g. DataDate_Userid_Actionid_Timestamp.
The example rowkey is as follow.
> rowkey:
> 20140601_784353454593233274_20123282_1401632522132
> 20140601_784353454_20123282_1401632522132
> 20140601_784470763593179377_20485247_1401632520825
> 20140601_784470763593233227_20485222_1401632520821
> However, it seems Hive does not support "partial rowkey scan". For example I want to
get all data that were generated on 06/01/2014, so I issue the following Hive query, but Hive
returns nothing.
> select * from table where DataDate="20140601";
> After several attempts, I found that I have to give exact row key (e.g. 20140601_784353454_20123282_1401632522132)
so that Hive can find that record.
> The reason I want to see the "partial rowkey scan" feature is because: in Hbase, partial
table scan should have better performance than full table scan.
> Is there any plan in Hive community to support "partial rowkey scan" in near future?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message