hbase-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chetan Khatri <chetan.opensou...@gmail.com>
Subject Re: Approach: Incremental data load from HBASE
Date Fri, 23 Dec 2016 18:27:30 GMT
Ted Correct, In my case i want Incremental Import from HBASE and
Incremental load to Hive. Both approach discussed earlier with Indexing
seems accurate to me. But like Sqoop support Incremental import and load
for RDBMS, Is there any tool which supports Incremental import from HBase ?



On Wed, Dec 21, 2016 at 10:04 PM, Ted Yu <yuzhihong@gmail.com> wrote:

> Incremental load traditionally means generating hfiles and
> using org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles to load the
> data into hbase.
>
> For your use case, the producer needs to find rows where the flag is 0 or
> 1.
> After such rows are obtained, it is up to you how the result of processing
> is delivered to hbase.
>
> Cheers
>
> On Wed, Dec 21, 2016 at 8:00 AM, Chetan Khatri <
> chetan.opensource@gmail.com> wrote:
>
>> Ok, Sure will ask.
>>
>> But what would be generic best practice solution for Incremental load
>> from HBASE.
>>
>> On Wed, Dec 21, 2016 at 8:42 PM, Ted Yu <yuzhihong@gmail.com> wrote:
>>
>>> I haven't used Gobblin.
>>> You can consider asking Gobblin mailing list of the first option.
>>>
>>> The second option would work.
>>>
>>>
>>> On Wed, Dec 21, 2016 at 2:28 AM, Chetan Khatri <
>>> chetan.opensource@gmail.com> wrote:
>>>
>>>> Hello Guys,
>>>>
>>>> I would like to understand different approach for Distributed
>>>> Incremental load from HBase, Is there any *tool / incubactor tool* which
>>>> satisfy requirement ?
>>>>
>>>> *Approach 1:*
>>>>
>>>> Write Kafka Producer and maintain manually column flag for events and
>>>> ingest it with Linkedin Gobblin to HDFS / S3.
>>>>
>>>> *Approach 2:*
>>>>
>>>> Run Scheduled Spark Job - Read from HBase and do transformations and
>>>> maintain flag column at HBase Level.
>>>>
>>>> In above both approach, I need to maintain column level flags. such as
>>>> 0 - by default, 1-sent,2-sent and acknowledged. So next time Producer will
>>>> take another 1000 rows of batch where flag is 0 or 1.
>>>>
>>>> I am looking for best practice approach with any distributed tool.
>>>>
>>>> Thanks.
>>>>
>>>> - Chetan Khatri
>>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message