hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Shrijeet Paliwal <shrij...@rocketfuel.com>
Subject Re: [HBase 0.92.1] Too many stores files to compact, compaction moving slowly
Date Mon, 14 May 2012 22:03:44 GMT
These M files will have to contain globally sorted entries (first
entry in 0th file will be smallest key and last entry of M-1th file
will be the largest key), No?
configureIncrementalLoad achieves this by peeking into existing table
and prepares a file to enforce total order (by reading split points
via  table.getStartKeys())

Like you said , in my case - table will be created after MR job
completes. So I guess what I need to do is come up with a split file.
Give it to both the MR job's partitioner and create table command
(create 't1', 'f1', {SPLITS_FILE => 'splits.txt'}) . Finally use bulk
import.

Unless there is a way in bulk import to enforce total order even if
the output of MR is not that way. Coming up with this file before hand
is not a problem in my case. But just want to check if I am getting
your point correctly.

Thanks Stack.

-Shrijeet

On Mon, May 14, 2012 at 2:46 PM, Stack <stack@duboce.net> wrote:
> On Mon, May 14, 2012 at 2:11 PM, Shrijeet Paliwal
> <shrijeet@rocketfuel.com> wrote:
>> Ahh of course! Thank you. One question what partition file I give to
>> the top partitioner?
>> I am trying to parse your last comment.
>> "You could figure how many you need by looking at the output of your MR job"
>>
>> Chicken and egg? Or am I not following you correctly.
>>
>
> I was thinking that your MR job would not look to a table at all to
> figure where to partition the data.  Rather, your reducer would write
> out files of size N where size N is just under your region max file
> size.  After the MR is done, you'll then have M files.  You'll need to
> create a table w/ M region boundaries (or M+1?) to match the flies
> produced (HFiles write out their first and last keys in metadata
> IIRC).  You'll have to override the likes of the
> configureIncrementalLoad in HFileOutputFormat methinks.
>
> Its just a suggestion.  I've not dug in on viability.
>
> St.Ack

Mime
View raw message