hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hudson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-7131) Job History Server has race condition where it moves files from intermediate to finished but thinks file is in intermediate
Date Thu, 06 Sep 2018 20:05:00 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-7131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16606339#comment-16606339
] 

Hudson commented on MAPREDUCE-7131:
-----------------------------------

ABORTED: Integrated in Jenkins build Hadoop-trunk-Commit #14887 (See [https://builds.apache.org/job/Hadoop-trunk-Commit/14887/])
MAPREDUCE-7131. Job History Server has race condition where it moves (jlowe: rev eb0b5a844f960017f6f48d746174d0f5826f0e5f)
* (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/test/java/org/apache/hadoop/mapreduce/v2/hs/TestHistoryFileManager.java
* (edit) hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-hs/src/main/java/org/apache/hadoop/mapreduce/v2/hs/HistoryFileManager.java


> Job History Server has race condition where it moves files from intermediate to finished
but thinks file is in intermediate
> ---------------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-7131
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-7131
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>    Affects Versions: 2.7.4
>            Reporter: Anthony Hsu
>            Assignee: Anthony Hsu
>            Priority: Major
>         Attachments: MAPREDUCE-7131.1.patch, MAPREDUCE-7131.2.patch, MAPREDUCE-7131.3.patch,
MAPREDUCE-7131.4.patch, MAPREDUCE-7131.5.patch, MAPREDUCE-7131.6.patch
>
>
> This is the race condition that can occur:
> # during the first *scanIntermediateDirectory()*, *HistoryFileInfo.moveToDone()* is scheduled
for job j1
> # during the second *scanIntermediateDirectory()*, j1 is found again and put in the *fileStatusList*
to process
> # *HistoryFileInfo.moveToDone()* is processed in another thread and history files are
moved to the finished directory
> # the *HistoryFileInfo* for j1 is removed from *jobListCache*
> # the j1 in *fileStatusList* is processed and a new *HistoryFileInfo* for j1 is created
(history, conf, and summary files will point to the intermediate user directory, and state
will be IN_INTERMEDIATE) and added to the *jobListCache*
> # *moveToDone()* is scheduled for this new j1
> # *moveToDone()* fails during *moveToDoneNow()* for the history file because the source
path in the intermediate directory does not exist
> From this point on, while the new j1 *HistoryFileInfo* is in the *jobListCache*, the
JobHistoryServer will think the history file is in the intermediate directory. If a user queries
this job in the JobHistoryServer UI, they will get
> {code}
> org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Could not load history file <scheme>://<host>:<port>/mr-history/intermediate/<user>/job_1529348381246_27275711-1535123223269-<user>-<jobname>-1535127026668-1-0-SUCCEEDED-<queue>-1535126980787.jhist
> {code}
> Noticed this issue while running 2.7.4, but the race condition seems to still exist in
trunk.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-help@hadoop.apache.org


Mime
View raw message