hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (Updated) (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (MAPREDUCE-3972) Locking and exception issues in JobHistory Server.
Date Wed, 11 Apr 2012 16:09:17 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-3972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Joseph Evans updated MAPREDUCE-3972:
-------------------------------------------

    Attachment: MR-3972.txt

This patch addresses Alejandro's review comments.

For the jobListCache.values() issues there was only one place that it was called, and it was
making a copy of the collection returned there, so I moved that copy up to be part of the
synchronized block.

I am not really sure that is the correct solution from a performance perspective and I would
like some feedback on it.
                
> Locking and exception issues in JobHistory Server.
> --------------------------------------------------
>
>                 Key: MAPREDUCE-3972
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3972
>             Project: Hadoop Map/Reduce
>          Issue Type: Sub-task
>          Components: mrv2
>    Affects Versions: 0.23.2
>            Reporter: Robert Joseph Evans
>            Assignee: Robert Joseph Evans
>         Attachments: MR-3972.txt, MR-3972.txt, MR-3972.txt, MR-3972.txt
>
>
> The JobHistory server's locking is inconsistent and wrong in some cases.  This is not
super critical because the issues would only show up if a job is being cleaned up or moved
from intermediate done to done, at the same time it is being parsed into a CompletedJob. 
However the locking is slowing down the server in some cases, and is a ticking time bomb that
needs to be addressed.
> As part of this too we need to be sure that the Cleaner and Intermediate to Done migration
threads handle exceptions properly.  Now it appears that the exception is logged, and the
thread just shuts down.  This means that the history server could still be up and running
for weeks and never remove old jobs.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message