hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Meil <doug.m...@explorysmedical.com>
Subject Re: Bulk Loads and Updates
Date Wed, 03 Oct 2012 22:56:22 GMT

Hi there-

re:  "All 20 versions will get loaded but the 10 oldest will be deleted
the next major compaction."

Yep, that's what is expected to happen.

For information on KeyValue structure and compaction algorithm, seeĊ .


For info on bulk loading, see..


On 10/3/12 4:12 PM, "Paul Mackles" <pmackles@adobe.com> wrote:

>Keys in hbase are a combination of rowkey/column/timestamp.
>Two records with the same rowkey but different column will result in two
>different cells with the same rowkey which is probably what you expect.
>For two records with the same rowkey and same column, the timestamp will
>normally differentiate them but in the case of a bulk load, the timestamp
>could be the same so it may actually be a tie and both will be stored.
>There are no updates in bulk loads.
>All 20 versions will get loaded but the 10 oldest will be deleted during
>the next major compaction.
>I would definitely recommend setting up small scale tests for all of the
>above scenarios to confirm.
>On 10/3/12 3:35 PM, "Juan P." <gordoslocos@gmail.com> wrote:
>>Hi guys,
>>I've been reading up on bulk load using MapReduce jobs and I wanted to
>>validate something.
>>If I the input I wanted to load into HBase produced the same key for
>>several lines. How will HBase handle that?
>>I understand the MapReduce job will create StoreFiles which the region
>>servers just pick up and make available to the users. But is there a
>>validation to treat the first as insert and the rest as updates?
>>What about the limit on the number of versions of a key HBase can have?
>>I want to have 10 versions, but the bulk load has 20 values for the same
>>key, will it only keep the last 10?

View raw message