hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4384) Race conditions in IndexCache
Date Thu, 28 Jun 2012 20:50:44 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13403464#comment-13403464

Kihwal Lee commented on MAPREDUCE-4384:

When {{TestIndexCache}} failed, the log contained a warning message, "Map IDxxxx not found
in queue!!". The queue is used to figure out what to drop in its the FIFO cache replacement
policy. This message indicates that the cache entry was freed by a removeMap() call, but the
corresponding entry was not found in the queue.

This can happen if {{removeMap()}} is called while the cache entry is being loaded. If a new
incomplete entry is added to the cache between {{cache.get(mapId)}} and {[cache.remove{{mapId}}
in {{removeMap()}}, the new entry will be removed from the cache. Further, if {{totalMemoryUsed}}
is updated before the entry is fully loaded, it will end up subtracting zero from the usage.
When the loading is complete in {{readIndexFileToCache()}}, {{totalMemoryUsed}} will be incremented,
but since it was already removed from the cache, there is no way it can be decremented. Hence
the discrepancy in memory usage tracking.

This issue can be fixed by adding one more condition to the first check in {{removeMap()}}

   IndexInformation info = cache.get(mapId);
 - if ((info != null) && (info.getSize() == 0)) {
 + if (info == null || ((info != null) && (info.getSize() == 0))) {

Another potential issue is in {{readIndexFileToCache()}}. When two different threads are trying
to add the same entry to the cache, there can be a deadlock. When Thread A puts a new {{IndexInformation}}
object in the cache,  Thread B can come in a bit late and do {{wait()}} on this object to
be fully ready. The {{wait()}} is inside the {{synchronized(info)}} block and {{info}} is
the new object it just found in the cache.  But Thread A also tries to update the same object
and do {{notifyAll()}} inside a synchronized() block on it. This results in a deadlock.

> Race conditions in IndexCache
> -----------------------------
>                 Key: MAPREDUCE-4384
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4384
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: nodemanager
>    Affects Versions: 2.0.0-alpha
>            Reporter: Kihwal Lee
>             Fix For: 0.23.3, 2.0.1-alpha, 3.0.0
> TestIndexCache is intermittently failing due to a race condition. Up on inspection of
IndexCache implementation, more potential issues have been discovered.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message