hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Arun C Murthy (JIRA)" <j...@apache.org>
Subject [jira] Updated: (MAPREDUCE-1098) Incorrect synchronization in DistributedCache causes TaskTrackers to freeze up during localization of Cache for tasks.
Date Sat, 24 Oct 2009 03:20:59 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Arun C Murthy updated MAPREDUCE-1098:
-------------------------------------

    Status: Open  (was: Patch Available)

This is looking good, some comments:

# The old o.a.h.mapreduce.filecache.DistributedCache.releaseCache calls the new api with 0L
as the mtime - that is guaranteed to fail! We should either fix it to use the right mtime,
or always use 0 as the mtime (i.e. override getKey()) or throw an exception. Not decrementing
the cache silently is bad.
# I don't get why TrackerDistributedCacheManager.localizeCache actually deletes the file first...
we should never have localized the file. Deleting it might hid a bug.
# We have forever had a stupid problem where we localize /foo on HDFS to <jobcache>/foo/foo
in the local-fs. This patch continues that by doing some extra work (by calling cacheFilePath(cacheStatus.localLoadPath)).
Shouldn't we just fix it? Adding more code for continuing old bad-practices is not useful.



> Incorrect synchronization in DistributedCache causes TaskTrackers to freeze up during
localization of Cache for tasks.
> ----------------------------------------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1098
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1098
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>            Reporter: Sreekanth Ramakrishnan
>            Assignee: Amareshwari Sriramadasu
>             Fix For: 0.21.0
>
>         Attachments: MAPREDUCE-1098.patch, MAPREDUCE-1098.patch, MAPREDUCE-1098.patch,
patch-1098-0.20.txt, patch-1098-1.txt, patch-1098-2.txt, patch-1098-3.txt, patch-1098-ydist.txt,
patch-1098.txt
>
>
> Currently {{org.apache.hadoop.filecache.DistributedCache.getLocalCache(URI, Configuration,
Path, FileStatus, boolean, long, Path, boolean)}} allows only one {{TaskRunner}} thread in
TT to localize {{DistributedCache}} across jobs. Current way of synchronization is across
baseDir this has to be changed to lock on the same baseDir.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message