hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Radim Kubacki (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-6312) Hive fails due to stale proxy in ClientServiceDelegate
Date Thu, 30 Apr 2015 10:27:06 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-6312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14521286#comment-14521286
] 

Radim Kubacki commented on MAPREDUCE-6312:
------------------------------------------

Another related bug filed against Hive - https://issues.apache.org/jira/browse/HIVE-8339 This
time there is a patch that has workaround for this problem applied on Hive's side. 

> Hive fails due to stale proxy in ClientServiceDelegate
> ------------------------------------------------------
>
>                 Key: MAPREDUCE-6312
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6312
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: client
>    Affects Versions: 2.5.0
>            Reporter: Radim Kubacki
>
> ClientServiceDelegate initializes its realProxy field to AMProxy for a new or running
job. Later when the job finishes it will not update this proxy to query history server and
AM will not return valid data for this job.
> We found this while investigating https://issues.cloudera.org/browse/DISTRO-631 that
describes Hive failure because it uses loop like 
> {code}
>   progress(JobClient jc, RunningJob rj) { ...
>         while (!rj.isComplete() || (extraRounds > 0)) {
>             try {
>                 Thread.sleep(1000);
>             } catch (InterruptedException e) {
>             }
>             RunningJob newRj = jc.getJob(rj.getID());
>             if (newRj == null) {
>                 // under exceptional load, hadoop may not be able to look up status
>                 // of finished jobs (because it has purged them from memory). From
>                 // hive's perspective - it's equivalent to the job having failed.
>                 // So raise a meaningful exception
>                 throw new IOException("Could not find status of job:" + rj.getID());
>             } else {
>                 rj = newRj;
>             }
>         }
> {code}
> In this snippet JobClient.getJob will try to create RunningJob instance referring to
job file in /user/$USER/.staging even when job is finished and the file is moved to /user/history/done
(or /user/history/done_intermediate). 
> Note that Hive queries can succeed if there is a timing where HDFS performs actual file
delete with a delay.
> We can try to write a patch if there is an agreement that this should be fixed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message