hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Bulk import tools for HBase
Date Wed, 13 Oct 2010 12:39:41 GMT
On Mon, Oct 11, 2010 at 9:33 PM, Sean Bigdatafun
<sean.bigdatafun@gmail.com> wrote:
> Another potential "problem" of incremental bulk loader is that the number of
> reducers (for the bulk loading process) needs to be equal to the existing
> regions -- this seems to be unfeasible for very large table, say with 2000
> regions.
>
> Any comment on this? Thanks.

Yes, this is currently problematic if you have a very large table
(2000 regions) and a small MR cluster (where 2000 reducers is too
many).

It wouldn't be too difficult to amend the code so that each reducer is
responsible for a contiguous range of regions, and knows the split the
HFiles at region boundaries. Patches welcome :)

-Todd

>
> Sean
>
> On Fri, Oct 8, 2010 at 9:03 PM, Todd Lipcon <todd@cloudera.com> wrote:
>
>> What version are you building from? These tools are new as of this past
>> june.
>>
>> -Todd
>>
>> On Fri, Oct 8, 2010 at 4:52 PM, Leo Alekseyev <dnquark@gmail.com> wrote:
>>
>>  > We want to investigate HBase bulk imports, as described on
>> > http://hbase.apache.org/docs/r0.89.20100726/bulk-loads.html and and/or
>> > JIRA HBASE-48.  I can't seem to run either the importtsv tool or the
>> > completebulkload tool using the hadoop jar /path/to/hbase-VERSION.jar
>> > command.  In fact, the ImportTsv class is not part of that jar file.
>> > Am I looking in the wrong place for this class, or do I need to
>> > somehow customize the build process to include it?..  Our HBase was
>> > built from source using the default procedure.
>> >
>> > Thanks for any insight,
>> > --Leo
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Mime
View raw message