hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Todd Lipcon <t...@cloudera.com>
Subject Re: Strange behavior during Hive queries
Date Fri, 11 Sep 2009 21:16:05 GMT
Hi Brad,

mapred.tasktracker.map.tasks.maximum is a parameter read by the TaskTracker
when it starts up. It cannot be changed per-job.

Hope that helps
-Todd

On Fri, Sep 11, 2009 at 2:06 PM, Brad Heintz <brad.heintz@gmail.com> wrote:

> TIA if anyone can point me in the right direction on this.
>
> I'm running a simple Hive query (a count on an external table comprising
> 436 files, each of ~2GB).  The cluster's mapred-site.xml specifies
> mapred.tasktracker.map.tasks.maximum = 7 - that is, 7 mappers per worker
> node.  When I run regular MR jobs via "bin/hadoop jar myJob.jar...", I see 7
> mappers spawned on each worker.
>
> The problem:  When I run my Hive query, I see 2 mappers spawned per worker.
>
> When I do "set -v;" from the Hive command line, I see
> mapred.tasktracker.map.tasks.maximum = 7.
>
> The job.xml for the Hive query shows mapred.tasktracker.map.tasks.maximum =
> 7.
>
> The only lead I have is that the default for
> mapred.tasktracker.map.tasks.maximum is 2, and even though it's overridden
> in the cluster's mapred-site.xml I've tried redundanltly overriding this
> variable everyplace I can think of (Hive command line with "-hiveconf",
> using set from the Hive prompt, et al) and nothing works.  I've combed the
> docs & mailing list, but haven't run across the answer.
>
> Does anyone have any ideas what (if anything) I'm missing?  Is this some
> quirk of Hive, where it decides that 2 mappers per tasktracker is enough,
> and I should just leave it alone?  Or is there some knob I can fiddle to get
> it to use my cluster at full power?
>
> Many thanks in advance,
> - Brad
>
> --
> Brad Heintz
> brad.heintz@gmail.com
>

Mime
View raw message