hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1098) Incorrect synchronization in DistributedCache causes TaskTrackers to freeze up during localization of Cache for tasks.
Date Tue, 13 Oct 2009 17:17:31 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765155#action_12765155
] 

Arun C Murthy commented on MAPREDUCE-1098:
------------------------------------------

Let me elaborate on my concerns:

# This patch doesn't fix TrackerDistributedCacheManager.deleteCache which holds the same global
lock and deletes stale cache-files, a *big* hole.
# This patch changes the locking (i.e. lock cachedArchives followed by lcacheStatus) in one
and only place in the whole system while keeping it the same everywhere else! This, again,
is potentially a bad call. At the very least you need to fix TrackerDistributedCacheManager.deleteCache
and justify how changing the locking order is *ok*!

> Incorrect synchronization in DistributedCache causes TaskTrackers to freeze up during
localization of Cache for tasks.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1098
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1098
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: patch-1098.txt
>
>
> Currently {{org.apache.hadoop.filecache.DistributedCache.getLocalCache(URI, Configuration,
Path, FileStatus, boolean, long, Path, boolean)}} allows only one {{TaskRunner}} thread in
TT to localize {{DistributedCache}} across jobs. Current way of synchronization is across
baseDir this has to be changed to lock on the same baseDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message