hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Geoff Hendrey" <ghend...@decarta.com>
Subject TableMapper and getSplits
Date Fri, 02 Apr 2010 19:19:23 GMT
I have subclassed TableInputFormat and TableMapper. My job needs to read
from two tables (one row from each) during its map method. the reduce
method needs to write out to a table. For both the reads and the writes,
I am using simple Get and Put respectively with autoflush true.
One problem I see is that the number of map tasks that I get with HBase
is limited to the number of regions in the table. This seems to make the
job slower than it would be if I had many more mappers. Could I improve
the situation by overriding getSplits so that I could have many more
I saw the following doc'd in TableMapReduceUtil: "Ensures that the given
number of reduce tasks for the given job configuration does not exceed
the number of regions for the given table. " Is there some reason one
would want to insure that the number of tasks doesn't exceed the number
of regions? It just seems to me that having one region serv only a
single task would result in an underloaded HBase. Thoughts?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message