hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4033) time lag between job completion and job being avail in JH server makes Oozie fail
Date Thu, 29 Mar 2012 15:50:29 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13241325#comment-13241325
] 

Robert Joseph Evans commented on MAPREDUCE-4033:
------------------------------------------------

It looks like your job, and the history server have different configuration values for where
to write/read the jhist files.

I see your oozie job create the directory

/tmp/hadoop-yarn/staging/history/done_intermediate/test

But I see the history server looking for jobs under

/home/tucu/src/cloudera/oozietucu/core/target/org.apache.hadoop.mapred.MiniMRCluster/apps_staging_dir/history/done_intermediate

Which looks like the MiniCluster overriding the value for when we don't use HDFS.

So when the test passes it probably got the status from the AM before it exited, and when
it fails it tried to get status from the history server, but the history server has no knowledge
of your job, because the files are not where it expects them to be.  I am not super familiar
with the mini cluster so I am not super sure where to look to fix this.
                
> time lag between job completion and job being avail in JH server makes Oozie fail
> ---------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-4033
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4033
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 0.23.3
>            Reporter: Alejandro Abdelnur
>            Priority: Critical
>             Fix For: 2.0.0
>
>         Attachments: minicluster-oozie-pig.txt
>
>
> Oozie testcases are failing randomly because MR2 reports the job as unknown.
> This seems to happen when Oozie queries via JobClient.getJob(<JOBID>) for a <JOBID>
that just finished.
> {code}
> org.apache.oozie.action.ActionExecutorException: JA017: Unknown hadoop job [job_1332176678205_0011]
associated with action [0000000-120319101023910-oozie-tucu-W@pig-action].  Failing this action!
> {code}
> Oozie reports this error when JobClient.getJob(<JOBID>) returns NULL.
> Looking at the mini cluster logs the job definitely run.
> {code}
>  find . -name "*1332176678205_0011*"
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_0/application_1332176678205_0011
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_0/application_1332176678205_0011/container_1332176678205_0011_01_000002
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_0/application_1332176678205_0011/container_1332176678205_0011_01_000001
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1332176678205_0011
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1332176678205_0011/container_1332176678205_0011_01_000002
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_2/application_1332176678205_0011/container_1332176678205_0011_01_000001
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1332176678205_0011
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1332176678205_0011/container_1332176678205_0011_01_000002
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_3/application_1332176678205_0011/container_1332176678205_0011_01_000001
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_1/application_1332176678205_0011
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_1/application_1332176678205_0011/container_1332176678205_0011_01_000002
> ./core/target/org.apache.hadoop.mapred.MiniMRCluster/org.apache.hadoop.mapred.MiniMRCluster-logDir-nm-0_1/application_1332176678205_0011/container_1332176678205_0011_01_000001
> {code}
> It seems there is a gap until the the job is avail in the JH server.
> If this gap is unavoidable we need to ensure Oozie always waits at least the gap time
before querying for a job.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message