hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From gordoslocos <gordoslo...@gmail.com>
Subject Re: Bulk Loads and Updates
Date Wed, 03 Oct 2012 20:22:25 GMT
Thank you Paul.

I was just thinking that I could use add a reducer to the step that prepares the data to build
custom logic around having multiple entries which produce the same rowkey. What do u think?

Sent from my iPhone

On 03/10/2012, at 17:12, Paul Mackles <pmackles@adobe.com> wrote:

> Keys in hbase are a combination of rowkey/column/timestamp.
> 
> Two records with the same rowkey but different column will result in two
> different cells with the same rowkey which is probably what you expect.
> 
> For two records with the same rowkey and same column, the timestamp will
> normally differentiate them but in the case of a bulk load, the timestamp
> could be the same so it may actually be a tie and both will be stored.
> There are no updates in bulk loads.
> 
> All 20 versions will get loaded but the 10 oldest will be deleted during
> the next major compaction.
> 
> I would definitely recommend setting up small scale tests for all of the
> above scenarios to confirm.
> 
> On 10/3/12 3:35 PM, "Juan P." <gordoslocos@gmail.com> wrote:
> 
>> Hi guys,
>> I've been reading up on bulk load using MapReduce jobs and I wanted to
>> validate something.
>> 
>> If I the input I wanted to load into HBase produced the same key for
>> several lines. How will HBase handle that?
>> 
>> I understand the MapReduce job will create StoreFiles which the region
>> servers just pick up and make available to the users. But is there a
>> validation to treat the first as insert and the rest as updates?
>> 
>> What about the limit on the number of versions of a key HBase can have? If
>> I want to have 10 versions, but the bulk load has 20 values for the same
>> key, will it only keep the last 10?
>> 
>> Thanks,
>> Juan
> 

Mime
View raw message