hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Aaron Greenhouse (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-3557) Class JobControl needs to be rewritten for safe concurrency
Date Mon, 16 Jun 2008 17:39:46 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-3557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Aaron Greenhouse updated HADOOP-3557:
-------------------------------------

    Attachment: HADOOP-3557.patch

New patch file.  This one properly generated by SVN itself. Shows changes to callers that
used to use allFinished(); they now use waitForAllFinished().


> Class JobControl needs to be rewritten for safe concurrency
> -----------------------------------------------------------
>
>                 Key: HADOOP-3557
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3557
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.17.0
>            Reporter: Aaron Greenhouse
>         Attachments: HADOOP-3557.patch
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The use of concurrency control and synchronization in JobControl is broken.  Consider
> - The locking in addToQueue() and toArrayList() is supposed to create per-queue locking
for the queues referenced by the fields readyJobs, runningJobs, successfulJobs, waitingJobs,
and failedJobs.  But this is compromised by the fact that these methods are always called
when the thread is already synchronized on "this".
> - The run() method is written to use a busy-loop and Thread.sleep(int). But runnerState
is not generally accessed using synchronization, nor is it volatile. This means that the run()
is not guaranteed to ever view updates to runnerState.  The run() method really needs to be
changed to use wait(), and to be signaled when changes are made to runnerState or to any of
the queues.
> Let's also consider the existing synchronized methods:
> addJob() -- I'm pretty sure that the synchronization in addJob is only necessary to protect
the (indirect) access to the long field nextJobID. This situation could be simplified by moving
the synchronization to getNextJobID(). As it stands, it makes the locking in addToQueue()
redundant. 
> allFinished() -- This is synchronized to provide an atomic snapshot of the state of the
queues. It should be changed to use wait() and be notified whenever the size of one of the
queues changes. 
> checkRunningJobs(), checkWaitingJobs(), and startReadyJobs() -- These are all synchronized
to prevent them from interfering with each other, themselves, and with addJobs(). This is
too conservative. We really only need to be sure that each one doesn't interfere with itself
(otherwise we could end up processing entries twice in parallel), but this is not possible
based on the intended usage: they are are only called from the run(), so they can never be
concurrency invoked. I don't see any danger from them running concurrently with addJob(),
as long as the integrity of the underlying queues is protected. As with addJob(), the synchronization
on these methods makes the per-queue synchronization in addToQueue redundant. 
> This class should be rewritten as follows:
>     * The field groupName should be made final. 
>     * Add a new private boolean field stateChanged that is set to true whenever the state
of the object is changed from a public method call. This is checked and reset by run(). 
>     * The per-queue synchronization needs to be scrapped: it isn't being exploited any
way. 
>     * The five queue fields plus the fields runnerState, nextJobID, and the new stateChanged
should all be protected by the lock this. 
>     * The run() method should be rewritten to use wait(). 
>     * public methods that change the state of the queues or runnerState should set stateChanged
to true and call notifyAll(). 
>     * Ideally, allFinished() should be changed to use wait() as well. This would change
the public interface of the class and break existing clients.  So this method should be @Deprecated,
and a new waitForAllFinished() method should be added.
>     * The queue fields should be assigned HashMap objects instead of Hashtable objects.
There is no advantage to using the synchronization provided by Hashtable. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message