hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexandre Linte (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-15017) Random job failures with MapReduce and Tez
Date Thu, 20 Oct 2016 09:48:58 GMT

    [ https://issues.apache.org/jira/browse/HIVE-15017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15591369#comment-15591369
] 

Alexandre Linte commented on HIVE-15017:
----------------------------------------

Hi [~sershe],
The "yarn logs" command doesn't return the logs as you can see below.
{noformat}
[root@namenode01 ~]# yarn logs -applicationId application_1475850791417_0105
/Products/YARN/logs/hdfs/logs/application_1475850791417_0105 does not exist.
Log aggregation has not completed or is not enabled.
{noformat}
So I decided to dig into the logs manually. I found interesting things on both datanode05
and datanode06. The error "255" appears regularly, I think this is the cause of the container
crash.

I uploaded the relevant part of the logs.

> Random job failures with MapReduce and Tez
> ------------------------------------------
>
>                 Key: HIVE-15017
>                 URL: https://issues.apache.org/jira/browse/HIVE-15017
>             Project: Hive
>          Issue Type: Bug
>          Components: Hive
>    Affects Versions: 2.1.0
>         Environment: Hadoop 2.7.2, Hive 2.1.0
>            Reporter: Alexandre Linte
>            Priority: Critical
>         Attachments: hive_cli_mr.txt, hive_cli_tez.txt, nodemanager_logs_mr_job.txt,
yarn_syslog_mr_job.txt, yarn_syslog_tez_job.txt
>
>
> Since Hive 2.1.0, we are facing a blocking issue on our cluster. All the jobs are failing
randomly on mapreduce and tez as well. 
> In both case, we don't have any ERROR or WARN message in the logs. You can find attached:
> - hive cli output errors 
> - yarn logs for a tez and mapreduce job
> - nodemanager logs (mr only, we have the same logs with tez)
> Note: This issue doesn't exist with Pig jobs (mr + tez), Spark jobs (mr), so this cannot
be an Hadoop / Yarn issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message