hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Gary Clark <gcl...@neces.com>
Subject RE: Seeing strange limit
Date Wed, 30 Dec 2015 15:13:57 GMT
Unfortunately I cannot find mapred.chils.opts in this version:

    <property>
        <name>mapreduce.map.java.opts</name>
        <value>-Xmx1024m -XX:-UseGCOverheadLimit</value>
        <description>Heap-size for child jvms of maps.</description>
    </property>

I’m assuming this is the blighter. Is there a way to measure from the dataset and the query
at what size I need to set this limit?

Much Appreciated,
Gary C

From: Edward Capriolo [mailto:edlinuxguru@gmail.com]
Sent: Wednesday, December 30, 2015 9:08 AM
To: user@hive.apache.org
Subject: Re: Seeing strange limit

In the old days mapred.chils.opts was the one. Knowing the query and the dataset helps as
well.

On Wednesday, December 30, 2015, Gary Clark <gclark@neces.com<mailto:gclark@neces.com>>
wrote:
    <value>-Xmx1024m -XX:-UseGCOverheadLimit</value>

I think this is the limit I need to tweak.

From: Gary Clark [mailto:gclark@neces.com<javascript:_e(%7B%7D,'cvml','gclark@neces.com');>]
Sent: Wednesday, December 30, 2015 8:59 AM
To: user@hive.apache.org<javascript:_e(%7B%7D,'cvml','user@hive.apache.org');>
Subject: RE: Seeing strange limit

Thanks, currently  have the below:

export HADOOP_PORTMAP_OPTS="-Xmx512m $HADOOP_PORTMAP_OPTS"

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"

and HADOOP_HEAPSIZE=4096

I’m assuming just raising the above would work.

Much Appreciated,
Gary C

From: Edward Capriolo [mailto:edlinuxguru@gmail.com<javascript:_e(%7B%7D,'cvml','edlinuxguru@gmail.com');>]
Sent: Wednesday, December 30, 2015 8:55 AM
To: user@hive.apache.org<javascript:_e(%7B%7D,'cvml','user@hive.apache.org');>
Subject: Re: Seeing strange limit

This message means the garbage collector runs but is unable to free memory after trying for
a while.

This can happen for a lot of reasons. With hive it usually happens when a query has a lot
of intermediate data.

For example imaging a few months ago count (distinct(ip)) returned 20k. Everything works,
then your data changes and suddenly you have issues.

Try tuning mostly raising your xmx.

On Wednesday, December 30, 2015, Gary Clark <gclark@neces.com<javascript:_e(%7B%7D,'cvml','gclark@neces.com');>>
wrote:
Hello,

I have a multi-node cluster (hadoop 2.6.0) and am seeing the below message causing the hive
workflow to fail:

Looking at the hadoop logs I see the below:

45417 [main] ERROR org.apache.hadoop.hive.ql.Driver  - FAILED: Execution Error, return code
-101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. GC overhead limit exceeded

I have been running for months without problems. When I removed a large amount of the files
from the directory which I was running a query on the query succeeded. It looks like I’m
hitting a limit not sure how to remedy this.

Has anybody else seen this problem?

Thanks,
Gary C


--
Sorry this was sent from mobile. Will do less grammar and spell check than usual.


--
Sorry this was sent from mobile. Will do less grammar and spell check than usual.
Mime
View raw message