hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alejandro Abdelnur (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3972) Locking and exception issues in JobHistory Server.
Date Wed, 11 Apr 2012 09:58:25 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13251450#comment-13251450

Alejandro Abdelnur commented on MAPREDUCE-3972:

NITs: CachedHistoryStorage.getFullJob() 'metaInfo' local var should be renamed to 'fileInfo'
      CachedHistoryStorage.getAllPartialJobs() loop var 'mi' should be renamed to 'fileInfo'
      HistoryFileManager.addDirecotryToJobListCache(), scanIntermediateDirectory(), getJobMetaInfo(),
scanOldDirsForJob(), getMetaInfo() & clean() methods, local variable 'metaInfo' should
be renamed to 'fileInfo'
      HistoryFileManager.deleteJobFromDone() parameter 'metatInfo' should be renamed to 'fileInfo'

RENAMEs: HistoryFileManager methods getJobMetaInfo()/getAllMetaInfo()/getMetaInfo(..) should
be renamed to use HistoryFileInfo in their names. The getJobMetaInfo() and getMetaInfo() method
names are not clear on their differences.

PATCH NEEDS REBASE: TestJobHistoryParsing.java fails to apply cleanly due to minor changes

IMPROVEMENTs: JobHistory.stop() the schedulerExecutor should be 'shutdownNow()' after the
grace period.

ERRORs: HistoryFileManager$JobListCache.values() is synchronized but this won't synchronize
the access to the values, either we have to copy the values within this method or we use a
concurrent data structure instead of TreeMap.

> Locking and exception issues in JobHistory Server.
> --------------------------------------------------
>                 Key: MAPREDUCE-3972
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3972
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>         Attachments: MR-3972.txt, MR-3972.txt, MR-3972.txt
> The JobHistory server's locking is inconsistent and wrong in some cases.  This is not
super critical because the issues would only show up if a job is being cleaned up or moved
from intermediate done to done, at the same time it is being parsed into a CompletedJob. 
However the locking is slowing down the server in some cases, and is a ticking time bomb that
needs to be addressed.
> As part of this too we need to be sure that the Cleaner and Intermediate to Done migration
threads handle exceptions properly.  Now it appears that the exception is logged, and the
thread just shuts down.  This means that the history server could still be up and running
for weeks and never remove old jobs.  

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message