hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Avery Ching <ach...@yahoo-inc.com>
Subject TableInputFormat and number of mappers == number of regions
Date Sat, 09 Apr 2011 16:15:56 GMT

First off, I'd like to say thanks to the developers for HBase, it's been fun to work with.

I've been using TableInputFormat to run a Map-Reduce job and ran into an issue.

Exception in thread "main" org.apache.hadoop.ipc.RemoteException: java.io.IOException: java.io.IOException:
The number of tasks for this job 149624 exceeds the configured limit 100000

The table i'm accessing has 149624 regions, however my Hadoop instance won't allow me to start
a job with that many map tasks.  After briefly looking at the TableInputFormatBase code, it
appears that since TableSplit only knows about a single region, my job will be forced into
having mappers == # of regions.  Since the Hadoop instance I'm using is shared, I'm concerned
that even if configured limit was raised, having Jobs with so many mappers would eventually
cause havoc to the job tracker.

Given that I have no control over the number of regions in the table (maintained by someone
else), is the only solution to implement another input format (i.e. MultiRegionTableFormat)
that allows InputSplits to have more than one region?  I don't mind doing it, but didn't want
to write it if another solution already exists.

Apologies if this issue has been raised before, but a quick search didn't turn anything up
for me.



  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message