hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jonathan Coveney" <jcove...@gmail.com>
Subject Re: hive : question about reducers
Date Thu, 10 Feb 2011 23:49:50 GMT
How many days of data are you working on?

Sent via BlackBerry

-----Original Message-----
From: Viral Bajaria <viral.bajaria@gmail.com>
Date: Thu, 10 Feb 2011 15:21:32 
To: <user@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Re: hive : question about reducers

I don't have any explicit bucketing in my data. The data is partitioned by
current_date (it has no hour information, so basically 24 hours of data).

It's not a problem because eventually the job would complete (super-slow)
but it would be nice to know the reason behind this behavior and how I could
optimize it so that I can take full advantage of having multiple reducers
running.

-Viral

On Thu, Feb 10, 2011 at 3:02 PM, Ajo Fod <ajo.fod@gmail.com> wrote:

> I've had similar experiences ... usually with bucketing.
>
> Is this your experience too?
>
> -Ajo
>
>
> On Thu, Feb 10, 2011 at 1:57 PM, Viral Bajaria <viral.bajaria@gmail.com>wrote:
>
>> Hello,
>>
>> In my Hive cluster, I have setup the mapred.reduce.tasks to be -1 i.e. I
>> am allowing HIVE to figure out the # of reducers that it would need from the
>> data.
>>
>> When I run a query, it determines that it will need 4 reducers but when I
>> look at the MAPRED logs, I see that all the work is done by a single reducer
>> while the other 3 reducers forward 0 rows. Is this just bad planning on HIVE
>> side or am I missing something.
>>
>> Thanks,
>> Viral
>>
>
>

Mime
View raw message