hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brad Heintz <brad.hei...@gmail.com>
Subject Re: Strange behavior during Hive queries
Date Sun, 13 Sep 2009 16:32:49 GMT
No, I'm using vanilla 0.20.0.  Other, non-Hive jobs are also running with
more mappers, so I don't think it'd be that setting even if I had it
available.

On Fri, Sep 11, 2009 at 5:28 PM, Todd Lipcon <todd@cloudera.com> wrote:

> Hrm... sorry, I didn't read your original query closely enough.
>
> I'm not sure what could be causing this. The map.tasks.maximum parameter
> shouldn't affect it at all - it only affects the number of slots on the
> trackers.
>
> By any chance do you have mapred.max.maps.per.node set? This is a
> configuration parameter added by HADOOP-5170 - it's not in trunk or the
> vanilla 0.18.3 release, but if you're running Cloudera's 0.18.3 release this
> parameter could cause the behavior you're seeing. However, it would
> certainly not default to 2, so I'd be surprised if that were it.
>
> -Todd
>
>
> On Fri, Sep 11, 2009 at 2:20 PM, Brad Heintz <brad.heintz@gmail.com>wrote:
>
>> Todd -
>>
>> Of course; it makes sense that it would be that way.  But I'm still left
>> wondering why, then, my Hive queries are only using 2 mappers per task
>> tracker when other jobs use 7.  I've gone so far as to diff the job.xml
>> files from a regular job and a Hive query, and didn't turn up anything -
>> though clearly, it has to be something Hive is doing.
>>
>> Thanks,
>> - Brad
>>
>>
>>
>> On Fri, Sep 11, 2009 at 5:16 PM, Todd Lipcon <todd@cloudera.com> wrote:
>>
>>> Hi Brad,
>>>
>>> mapred.tasktracker.map.tasks.maximum is a parameter read by the
>>> TaskTracker when it starts up. It cannot be changed per-job.
>>>
>>> Hope that helps
>>> -Todd
>>>
>>>
>>> On Fri, Sep 11, 2009 at 2:06 PM, Brad Heintz <brad.heintz@gmail.com>wrote:
>>>
>>>> TIA if anyone can point me in the right direction on this.
>>>>
>>>> I'm running a simple Hive query (a count on an external table comprising
>>>> 436 files, each of ~2GB).  The cluster's mapred-site.xml specifies
>>>> mapred.tasktracker.map.tasks.maximum = 7 - that is, 7 mappers per worker
>>>> node.  When I run regular MR jobs via "bin/hadoop jar myJob.jar...", I see
7
>>>> mappers spawned on each worker.
>>>>
>>>> The problem:  When I run my Hive query, I see 2 mappers spawned per
>>>> worker.
>>>>
>>>> When I do "set -v;" from the Hive command line, I see
>>>> mapred.tasktracker.map.tasks.maximum = 7.
>>>>
>>>> The job.xml for the Hive query shows
>>>> mapred.tasktracker.map.tasks.maximum = 7.
>>>>
>>>> The only lead I have is that the default for
>>>> mapred.tasktracker.map.tasks.maximum is 2, and even though it's overridden
>>>> in the cluster's mapred-site.xml I've tried redundanltly overriding this
>>>> variable everyplace I can think of (Hive command line with "-hiveconf",
>>>> using set from the Hive prompt, et al) and nothing works.  I've combed the
>>>> docs & mailing list, but haven't run across the answer.
>>>>
>>>> Does anyone have any ideas what (if anything) I'm missing?  Is this some
>>>> quirk of Hive, where it decides that 2 mappers per tasktracker is enough,
>>>> and I should just leave it alone?  Or is there some knob I can fiddle to
get
>>>> it to use my cluster at full power?
>>>>
>>>> Many thanks in advance,
>>>> - Brad
>>>>
>>>> --
>>>> Brad Heintz
>>>> brad.heintz@gmail.com
>>>>
>>>
>>>
>>
>>
>> --
>> Brad Heintz
>> brad.heintz@gmail.com
>>
>
>


-- 
Brad Heintz
brad.heintz@gmail.com

Mime
View raw message