hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Panayotis Antonopoulos <antonopoulos...@hotmail.com>
Subject RE: HFiles that fit within a single region VS better load balancing at reduce phase
Date Wed, 25 May 2011 17:23:54 GMT

So your answer would be that it is better to have the best possible load balancing during
the reduce phase instead of taking care to output Hfiles that fit within a single Region,
because splitting done by Incremental Load is rather fast?

> Date: Wed, 25 May 2011 09:20:10 -0700
> Subject: Re: HFiles that fit within a single region VS better load balancing at reduce
phase
> From: yuzhihong@gmail.com
> To: user@hbase.apache.org
> 
> LoadIncrementalHFiles would split HFile if it doesn't fit within a single
> region.
> 
> Please refer to the following JIRAs which speedup LoadIncrementalHFiles:
> https://issues.apache.org/jira/browse/HBASE-3871
> https://issues.apache.org/jira/browse/HBASE-3721
> 
> Note: parallelizing splitting of HFile(s) by LoadIncrementalHFiles is done
> on a single machine.
> 
> Thanks
> 
> 2011/5/25 Panayotis Antonopoulos <antonopoulospan@hotmail.com>
> 
> >
> > Hello,
> > I am currently working on a MR job that will output HFiles that will be
> > bulk loaded in an HBase Table.
> > According to the HBase site in order for the bulk loading to be efficient
> > each HFile of the MR job should fit within a single region.
> > In order to achieve that I use the TotalOrderPartitioner so that each
> > reducer gets Key/Value pairs from a single region.
> > However this prevents partitioning Mapper's output in equal splits so that
> > I have the best possible load balancing during the reduce phase.
> >
> > So I would like to ask you how important is to create HFiles that fit
> > within a single region.
> > If it makes bulk loading much faster probably it is better to sacrifice
> > load balancing.
> > But is this the case?
> > Has anyone tried both choices?
> >
> > Thank you in advance!
> > Panagiotis.
> >
 		 	   		  
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message