hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Renaud Delbru <renaud.del...@deri.org>
Subject Re: Creating a Table using HFileOutputFormat
Date Fri, 24 Sep 2010 16:12:06 GMT
  On 24/09/10 16:55, Ted Yu wrote:
>  From TotalOrderPartitioner:
>        K[] splitPoints = readPartitions(fs, partFile, keyClass, conf);
>        if (splitPoints.length != job.getNumReduceTasks() - 1) {
> Partition list can be empty if you use 1 reducer.
> But this is not what you want I guess.
Yes, this is not what we want since we want to create x regions.
But, we just found that there is a tool, InputSampler, in the hadoop 
library for this task. It will sample an arbitrary dataset, and create 
the partition splits. We will try first this approach. My guess is that, 
even if these partitions are an approximation, it should be ok for 
hbase. The size of the regions will be not totally identical, but it 
should not be a problem since the larger regions will be the first ones 
split into smaller regions by hbase. Can somebody confirm this assumption ?
Renaud Delbru

View raw message