hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Leo Alekseyev <dnqu...@gmail.com>
Subject Re: Bulk import tools for HBase
Date Fri, 22 Oct 2010 23:19:54 GMT
What about the opposite problem?.. Suppose we are bulk-populating a
blank table from scratch, then we have a bunch of data going into one
region through one reducer.  One workaround is to import some data,
then split the region into however many regions we want, then import
the rest.  This sounds kludgy.  Is there a better approach?


On Wed, Oct 13, 2010 at 5:39 AM, Todd Lipcon <todd@cloudera.com> wrote:
> On Mon, Oct 11, 2010 at 9:33 PM, Sean Bigdatafun
> <sean.bigdatafun@gmail.com> wrote:
>> Another potential "problem" of incremental bulk loader is that the number of
>> reducers (for the bulk loading process) needs to be equal to the existing
>> regions -- this seems to be unfeasible for very large table, say with 2000
>> regions.
>> Any comment on this? Thanks.
> Yes, this is currently problematic if you have a very large table
> (2000 regions) and a small MR cluster (where 2000 reducers is too
> many).
> It wouldn't be too difficult to amend the code so that each reducer is
> responsible for a contiguous range of regions, and knows the split the
> HFiles at region boundaries. Patches welcome :)
> -Todd

View raw message