hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Elser <els...@apache.org>
Subject Re: Incremental import from HBase to Hive
Date Sat, 28 Jan 2017 19:34:40 GMT
(Please stop adding the dev@hbase mailing list. This is a question for 
the user@ list only.)

Unless you have a time component included in your HBase data, there is 
no way to find all "new" data in HBase with the timestamp component 
aside from scanning the entire HBase table. Performing a full table scan 
is not an ideal scenario, as it is not a situation which HBase is 
optimized for.

You can consider including a leading component of time in your rowKey or 
creating an index table of time loaded to rowKey to efficiently perform 
these lookups.

Chetan Khatri wrote:
> Sure, There are several applications talks to HBase and populate data, Now
> I want to load Incrementally data from HBase and do transformations like
> Data Quality (filters) and save at Hive.
> Incremental load means - I want to run this job weekly, and making sure
> should not get duplication at Hive level.
> Thanks.
> On Sat, Jan 28, 2017 at 1:00 AM, Josh Elser<elserj@apache.org>  wrote:
>> (-cc dev)
>> Might you be able to be more specific in the context of your question?
>> What kind of requirements do you have?
>> Chetan Khatri wrote:
>>> Hello Community,
>>> I am working with HBase 1.2.4 , what would be the best approach to do
>>> Incremental load from HBase to Hive ?
>>> Thanks.

View raw message