hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Maysam Yabandeh (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-5267) History server should be more robust when cleaning old jobs
Date Tue, 04 Jun 2013 03:23:21 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-5267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13673993#comment-13673993
] 

Maysam Yabandeh commented on MAPREDUCE-5267:
--------------------------------------------

I believe the particular bug reported in this JIRA is rooted in the implementation of listFiles.
I attempted to reproduce the reported scenario by creating a directory DIR under /mapred/history/done/
with only root access. In my local machine, the current unit tests smoothly pass over the
DIR by returning an empty list upon invocation of listFiles(). I guess this is not the case
for hdfs, and similarly to what this jira reports, an exception will be raise (although i
have not managed to run a unit test that exercise this).

Nevertheless, I agree with you that this problem should be addressed at a higher level, since
we do not know what is the next unpredictable scenario that raises an exception in the clean
procedure.

I would like to pick up this jira but I do not know how to write a unit test that exercise
a method by raising (general) exceptions in the  middle of it.
                
> History server should be more robust when cleaning old jobs
> -----------------------------------------------------------
>
>                 Key: MAPREDUCE-5267
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-5267
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobhistoryserver
>    Affects Versions: 0.23.7, 2.0.4-alpha
>            Reporter: Jason Lowe
>
> Ran across a situation where an admin user had accidentally created a directory in one
of the date directories under /mapred/history/done/ that was not readable by the historyserver
user.  That effectively prevented the history server from cleaning any jobs from that date
forward, as it hit an IOException trying to scan the directory and that aborted the entire
clean process.
> The history server should localize IOException handling to the directory/file being processed
and move on to the next entry in the list rather than aborting the entire cleaning process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message