hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From rohan monga <monga.ro...@gmail.com>
Subject Re: Severely hit by "curse of last reducer"
Date Thu, 17 Nov 2011 21:44:17 GMT
Hi Mark,
I have tried setting hive.optimize.skewjoin=true, but it get a
NullPointerException after the first stage of the query completes.
Why does that happen?

Thanks,
--
Rohan Monga



On Thu, Nov 17, 2011 at 1:37 PM, Mark Grover <mgrover@oanda.com> wrote:
> Ayon,
> I see. From what you explained, skew join seems like what you want. Have you tried that
already?
>
> Details on how skew join works are in this presentation. Jump to 15 minute mark if you
want to just listen about skew joins.
> http://www.youtube.com/watch?v=OB4H3Yt5VWM
>
> I bet you could also find something in the mail list archives related to Skew Join.
>
> In a nutshell (from the video),
> set hive.optimize.skewjoin=true
> set hive.skewjoin.key=<Threshold>
>
> should do the trick for you. Threshold, I believe, is the number of records you consider
a large number to defer till later.
>
> Good luck!
> Mark
>
> ----- Original Message -----
> From: "Ayon Sinha" <ayonsinha@yahoo.com>
> To: "Mark Grover" <mgrover@oanda.com>, user@hive.apache.org
> Sent: Wednesday, November 16, 2011 10:53:19 PM
> Subject: Re: Severely hit by "curse of last reducer"
>
>
>
> Only one reducer is always stuck. My table2 is small but using a Mapjoin makes my mappers
run out of memory. My max reducers is 32 (also max reduce capacity). I tried setting num reducers
to higher number (even 6000, which is appx. combination of dates & names I have) only
to have lots of reducers with no data.
> So I am quite sure its is some key in stage-1 thats is doing this.
>
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>
>
>
>
> From: Mark Grover <mgrover@oanda.com>
> To: user@hive.apache.org; Ayon Sinha <ayonsinha@yahoo.com>
> Sent: Wednesday, November 16, 2011 6:54 PM
> Subject: Re: Severely hit by "curse of last reducer"
>
> Hi Ayon,
> Is it one particular reduce task that is slow or the entire reduce phase? How many reduce
tasks did you have, anyways?
>
> Looking into what the reducer key was might only make sense if a particular reduce task
was slow.
>
> If your table2 is small enough to fit in memory, you might want to try a map join.
> More details at:
> http://www.facebook.com/note.php?note_id=470667928919
>
> Let me know what you find.
>
> Mark
>
> ----- Original Message -----
> From: "Ayon Sinha" < ayonsinha@yahoo.com >
> To: "Hive Mailinglist" < user@hive.apache.org >
> Sent: Wednesday, November 16, 2011 9:03:23 PM
> Subject: Severely hit by "curse of last reducer"
>
>
>
> Hi,
> Where do I find the log of what reducer key is causing the last reducer to go on for
hours? The reducer logs don't say much about the key its processing. Is there a way to enable
a debug mode where it would log the key it's processing?
>
>
> My query looks like:
>
>
> select partner_name, dates, sum(coins_granted) from table1 u join table2 p on u.partner_id=p.id
group by partner_name, dates
>
>
>
> My uncompressed size of table1 is about 30GB.
>
> -Ayon
> See My Photos on Flickr
> Also check out my Blog for answers to commonly asked questions.
>
>
>

Mime
View raw message