hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From stack <st...@duboce.net>
Subject Re: HBase bulk load
Date Thu, 14 Jan 2010 06:02:49 GMT
On Wed, Jan 13, 2010 at 9:49 PM, Sriram Muthuswamy Chittathoor <
sriramc@ivycomptech.com> wrote:

> I am trying to use this technique to say bulk load 20 billion rows.  I
> tried it on a smaller set 20 million rows. A few things I had to take
> care was to write a custom partitioning logic so that a range of keys
> only go to a particular reduce since there was some mention of global
> ordering.
> For example  Users  (1 --  1mill) ---> Reducer 1 and so on
> Good.

> My questions are:
> 1.  Can I divide the bulk loading into multiple runs  --  the existing
> bulk load bails out if it finds a HDFS output directory with the same
> name

No.  Its not currently written to do that but especially if your keys are
ordered, it probably wouldn't take much to make the above work (first job
does the first set of keys, and so on).

> 2.  What I want to do is make multiple runs of 10 billion and then
> combine the output before running  loadtable.rb --  is this possible ?
> I am thinking this may be required in case my MR bulk loading fails in
> between and I need to start from where I crashed
> Well, MR does retries but, yeah, you could run into some issue at the 10B
mark and want to then start over from there rather than start from the

One thing that the current setup does not do is remove the task hfile on
failure.  We should add this.  Would fix case where when speculative
execution is enabled, and the speculative tasks are kiled, we don't leave
around half-made hfiles (Currently I believe they they show as zero-length


> Any tips with huge bulk loading experience ?
> -----Original Message-----
> From: saint.ack@gmail.com [mailto:saint.ack@gmail.com] On Behalf Of
> stack
> Sent: Thursday, January 14, 2010 6:19 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: HBase bulk load
> See
> http://hadoop.apache.org/hbase/docs/r0.20.2/api/org/apache/hadoop/hbase/
> mapreduce/package-summary.html#bulk
> St.Ack
> On Wed, Jan 13, 2010 at 4:30 PM, Ted Yu <yuzhihong@gmail.com> wrote:
> > Jonathan:
> > Since you implemented
> >
> >
> https://issues.apache.org/jira/si/jira.issueviews:issue-html/HBASE-48/HB
> ASE-48.html
> > ,
> > maybe you can point me to some document how bulk load is used ?
> > I found bin/loadtable.rb and assume that can be used to import data
> back
> > into HBase.
> >
> > Thanks
> >
> This email is sent for and on behalf of Ivy Comptech Private Limited. Ivy
> Comptech Private Limited is a limited liability company.
> This email and any attachments are confidential, and may be legally
> privileged and protected by copyright. If you are not the intended recipient
> dissemination or copying of this email is prohibited. If you have received
> this in error, please notify the sender by replying by email and then delete
> the email completely from your system.
> Any views or opinions are solely those of the sender.  This communication
> is not intended to form a binding contract on behalf of Ivy Comptech Private
> Limited unless expressly indicated to the contrary and properly authorised.
> Any actions taken on the basis of this email are at the recipient's own
> risk.
> Registered office:
> Ivy Comptech Private Limited, Cyber Spazio, Road No. 2, Banjara Hills,
> Hyderabad 500 033, Andhra Pradesh, India. Registered number: 37994.
> Registered in India. A list of members' names is available for inspection at
> the registered office.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message