hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Justin Cohen <justin.co...@teamaol.com>
Subject Tuning simple count m/r job
Date Mon, 13 Sep 2010 21:43:16 GMT

  I have a table with 82 regions and about 44 million rows. It takes 
almost 6 minutes to count with map reduce. Is that a reasonable rate for 
a ten machine cluster of data nodes? That's just over 12,000 rows per 
second per machineā€¦. Can I do better? Right now the only custom thing I 
am doing is setting scan.setCaching to 10,000. There's one gz column per 
row, but I just want to count rows, not decompress the columns...

Is each map task assigned to each region? Some map tasks only have a few 
thousand rows. Others have over 2 million. Does this mean the regions 
aren't balanced, or does it also take into account size of columns with 
number of rows.


View raw message