hbase-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Reducer
Date Mon, 12 Sep 2011 12:39:18 GMT
Hey Sriram,

On Mon, Sep 12, 2011 at 5:48 PM, sriram <rsriramtce@gmail.com> wrote:
>
> Hi,
>            I set setNumReduceTasks value as 10 for the job in the code but
> always my reduce has only 1 task and it takes hell of time to process the
> reducer phase.Also reducer strucks at 33% and after a long time it shows
> progress after 60%.
>
> What is the problem.Am i missing something?

- Could you detail your job out? Does it use TableMapReduceUtil? What
does it do and what all of HBase API does it call after you set your
number of reduce tasks into the job configuration? (Note: If this is a
general MapReduce query, you should send it to
mapreduce-user@hadoop.apache.org instead)

- If you setNumReduceTasks(…) and see no change, it probably means
some part of your code or used library is overriding it back to 1 -
possibly for some reason or due to a bug. Its hard to tell what would
reset it unless one can take a look at the code.

- You, or your submission code may be calling
TableMapReduceUtil.limitNumReduceTasks(…) or
TableMapReduceUtil.setNumReduceTasks(…) both of which reset the no. of
reducers based on the number of output table regions at max. In this
case, its better to see if you have only a single large region in your
table, and get to fixing/splitting that (as it wouldn't parallelize
otherwise).

- I believe a single reducer applied over a large data would cause a
lot of time being spent in sorting, which is possibly why you're
noticing the delay from 33-66 progress %.

-- 
Harsh J

Mime
View raw message