hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Billy Pearson" <billy_pear...@sbcglobal.net>
Subject Re: Bulk import - does sort order of input data affect success rate?
Date Sun, 05 Apr 2009 06:29:53 GMT


I found using HRegionPartitioner on tables that are not new and have multi 
regions per server it speeds things up might look
in to making a HServerPartitioner one reduce per server but would lose 
performance if the server has many spare cores to use.

Billy

----- Original Message ----- 
From: "Ryan Rawson" <ryanobjc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Newsgroups: gmane.comp.java.hadoop.hbase.user
To: <hbase-user-7ArZoLwFLBtd/SJB6HiN2Ni2O/JbrIOy@public.gmane.org>
Sent: Thursday, April 02, 2009 5:53 PM
Subject: Re: Bulk import - does sort order of input data affect success 
rate?


> hey,
>
> sorted = slower, randomized = faster.
>
> this is because if it is sorted in natural key order, you tend to hotspot 
> in
> 1 or 2 regions.
>
> I don't use table output format, I use direct commits from the map, no
> reduce. That seems to be the most performance solution.
>
> have fun!
>
>
> On Thu, Apr 2, 2009 at 1:36 PM, Stuart White 
> <stuart.white1-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>wrote:
>
>> On Thu, Apr 2, 2009 at 3:30 PM, Ryan Rawson 
>> <ryanobjc-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> > The last thing - success should not be a function of sort order.
>> >
>> > However, speed will be related.
>>
>> How?  Sorted = faster, or Sorted = slower?
>>
>> >
>> > One thing I found I had to do was:
>> >    private void doCommit(HTable t, BatchUpdate update) throws 
>> > IOException
>> {
>> >      boolean commited = false;
>> >      while (!commited) {
>> >        try {
>> >          t.commit(update);
>> >          commited = true;
>> >        } catch (RetriesExhaustedException e) {
>> >          // DAMN, ignore
>> >        }
>> >      }
>> >    }
>> >
>>
>> I'm running a mapred job, using TableOutputFormat to write the results
>> to HBase.  For the code you've provided, was that for a custom output
>> format?  Or a standalone (non-mapred) application?  I see the point
>> you're making, I just don't understand where I'd put that code.
>> Thanks!
>>
> 



Mime
View raw message