hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Teng Yutong (JIRA)" <>
Subject [jira] [Commented] (HIVE-6584) Add HiveHBaseTableSnapshotInputFormat
Date Thu, 12 Jun 2014 02:28:01 GMT


Teng Yutong commented on HIVE-6584:

hi nick,

i have some concerns about these patches:
1. HBaseStorageHandler.getInputFormatClass(): i am afraid that the returned inputformat will
always be HiveHBaseTabelInputFormat (at least according to my test)
2. in the method HBaseStorageHandler.preCreateTable, hive will check whether the HBase table
exist or not, regardless the external table that hive gonna create is based on actual table
or a snapshot.
3. the TableSnapshotRegionSplit used in TableSnapshotInputFormat  is a direct subclass of
InputSplit, not a subclass of tablesplit
4. there is no public setScan method in TableSnapshotInputFormat.RecordReader, instead it
will translate a string into a scan instance by using mapreduce.TableMapReduceUitls.convertStringToScan.

So I suggest adding a subclass of HBaseStorageHandler(and other necessary classes) ,say HBaseSnapshotStorageHandler,
to deal with the hbase snapshot situation.

In fact, I have already finished the necessary code changes and done some tests. The tests
show that my modification works out.

i will upload my patch soon

> Add HiveHBaseTableSnapshotInputFormat
> -------------------------------------
>                 Key: HIVE-6584
>                 URL:
>             Project: Hive
>          Issue Type: Improvement
>          Components: HBase Handler
>            Reporter: Nick Dimiduk
>            Assignee: Nick Dimiduk
>             Fix For: 0.14.0
>         Attachments: HIVE-6584.0.patch, HIVE-6584.1.patch, HIVE-6584.2.patch, HIVE-6584.3.patch
> HBASE-8369 provided mapreduce support for reading from HBase table snapsopts. This allows
a MR job to consume a stable, read-only view of an HBase table directly off of HDFS. Bypassing
the online region server API provides a nice performance boost for the full scan. HBASE-10642
is backporting that feature to 0.94/0.96 and also adding a {{mapred}} implementation. Once
that's available, we should add an input format. A follow-on patch could work out how to integrate
this functionality into the StorageHandler, similar to how HIVE-6473 integrates the HFileOutputFormat
into existing table definitions.

This message was sent by Atlassian JIRA

View raw message