hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Edward Capriolo <edlinuxg...@gmail.com>
Subject Re: Hadoop error 2 while joining two large tables
Date Wed, 16 Mar 2011 17:18:09 GMT
On Wed, Mar 16, 2011 at 12:51 PM, Christopher, Pat
<patrick.christopher@hp.com> wrote:
> Are you using Hive on top of Hadoop or writing a raw Hadoop job?
>
>
>
> This is a the hive list so I’m going to assumed you’re running hive...  can
> you send your HiveQL query along?
>
>
>
> Pat
>
>
>
> From: hadoop n00b [mailto:new2hive@gmail.com]
> Sent: Wednesday, March 16, 2011 3:33 AM
> To: user@hive.apache.org
> Subject: Fwd: Hadoop error 2 while joining two large tables
>
>
>
> Hello,
>
>
>
> I am trying to execute a query that joins two large tables (3 million and 20
> million records). I am getting the Hadoop error code 2 during execution.
> This happens mainly while the reducers are running. Sometimes the reducers
> complete 100% and then the error comes. The logs talk about running out of
> Heap space and GC overhead limit exceeding.
>
>
>
> I am running a 6 node cluster with child JVM memory of 1GB.
>
>
>
> Are there any parameters I could tweak to make them run? Is adding more
> nodes the solution to such problem?
>
>
>
> Thanks!

First make sure you are on a recent hive 0.6.0 is the latest release.
Next thing you should do is always put the larger table on the right.
The next is to try setting mapred.child.java.opts -Xmx high thener it
is now, sometimes 1024M is needed
The third issue is that the data can be skewed. IE one key has
millions of rows,, others have none. There is a hive optimizer
variable for a skew join the might help.

Mime
View raw message