hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jianfeng (Jeff) Zhang" <jzh...@hortonworks.com>
Subject Re: Hive - Tez error with big join - Container expired.
Date Thu, 18 Jun 2015 10:51:08 GMT

Tez will hold the idle containers for a while, but it would also expire the container if it
reach some threshold.
Have you set property tez.am.container.idle.release-timeout-max.millis in tez-site.xml ? And
can you attach the yarn app log ?



Best Regard,
Jeff Zhang


From: Daniel Klinger <dk@web-computing.de<mailto:dk@web-computing.de>>
Reply-To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Date: Thursday, June 18, 2015 at 5:35 AM
To: "user@hive.apache.org<mailto:user@hive.apache.org>" <user@hive.apache.org<mailto:user@hive.apache.org>>
Subject: Hive - Tez error with big join - Container expired.

Hi all,

I have a pretty big Hive Query. I'm joining over 3 Hive-Tables which have thousands of lines
each. I'm grouping this join by several columns. In the Hive-Shell this query only reach about
80%. After about 1400 seconds its canceling with the following error:

Status: Failed
Vertex failed, vertexName=Map 2, vertexId=vertex_1434357133795_0008_1_01, diagnostics=[Task
failed, taskId=task_1434357133795_0008_1_01_000033, diagnostics=[TaskAttempt 0 failed, info=[Containercontainer_1434357133795_0008_01_000039
finished while trying to launch. Diagnostics: [Container failed. Container expired since it
was unused]], TaskAttempt 1 failed, info=[Containercontainer_1434357133795_0008_01_000055
finished while trying to launch. Diagnostics: [Container failed. Container expired since it
was unused]], TaskAttempt 2 failed, info=[Containercontainer_1434357133795_0008_01_000072
finished while trying to launch. Diagnostics: [Container failed. Container expired since it
was unused]], TaskAttempt 3 failed, info=[Containercontainer_1434357133795_0008_01_000101
finished while trying to launch. Diagnostics: [Container failed. Container expired since it
was unused]]], Vertex failed as one or more tasks failed. failedTasks:1, Vertex vertex_1434357133795_0008_1_01
[Map 2] killed/failed due to:null]
DAG failed due to vertex failure. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask

My yarn resource manager is at 100% during the whole execution (using all of the 300 GB memory).
I tried to extend the live time of my containers with the following setting in the yarn-site.xml
but no success:

yarn.resourcemanager.rm.container-allocation.expiry-interval-ms = 1200000

After this change my query stays at 0% over thousands of seconds. The query itself is working
(tested with less data). How can I solve this problem.

Thanks for your help.

Greetz
DK

Mime
View raw message