hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Panayotis Antonopoulos <antonopoulos...@hotmail.com>
Subject HFiles that fit within a single region VS better load balancing at reduce phase
Date Wed, 25 May 2011 10:58:50 GMT

I am currently working on a MR job that will output HFiles that will be bulk loaded in an
HBase Table.
According to the HBase site in order for the bulk loading to be efficient each HFile of the
MR job should fit within a single region.
In order to achieve that I use the TotalOrderPartitioner so that each reducer gets Key/Value
pairs from a single region.
However this prevents partitioning Mapper's output in equal splits so that I have the best
possible load balancing during the reduce phase.

So I would like to ask you how important is to create HFiles that fit within a single region.
If it makes bulk loading much faster probably it is better to sacrifice load balancing.
But is this the case?
Has anyone tried both choices?

Thank you in advance!
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message