hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hemanth Yamijala (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1140) Per cache-file refcount can become negative when tasks release distributed-cache files
Date Fri, 13 Nov 2009 11:53:39 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12777472#action_12777472
] 

Hemanth Yamijala commented on MAPREDUCE-1140:
---------------------------------------------

bq. This is done, because getLocalCache increments referenceCount first and then localizes.
Reference count should be decremented for the one just failed also. So, it should be added
to the list before the getLocalCache call.

Umm. But (atleast theoretically), it is still possible that a call to getLocalCache fails
before referenceCount is incremented. For e.g. makeRelative throws IOException; so does getLocalCacheForWrite.
Hence, we still have a situation where we record a file as being localized (by storing it
in localizedCacheFiles), but the reference count is not actually incremented. And releaseCache
would have the bug this JIRA is talking about still.

One more point I am slightly uncomfortable about is the duplication of state because of the
new list localizedCacheFiles. 

Here's an alternate proposal:

- Modify CacheFile to have a boolean saying isLocalized. By default, this is false. This will
be set to true if distributedCacheManager.getLocalCache returns successfully.
- To handle the case you have mentioned above, where a failure can happen after referenceCount
is incremented in getLocalCache, I would suggest we catch exceptions inside getLocalCache,
and on an exception, decrement the referenceCount and re-throw the exception. This seems right
to me - because if the getLocalCache doesn't complete, shouldn't we be consistent by decrementing
the reference count ?

Would this work ?

> Per cache-file refcount can become negative when tasks release distributed-cache files
> --------------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-1140
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1140
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: tasktracker
>    Affects Versions: 0.20.2, 0.21.0, 0.22.0
>            Reporter: Vinod K V
>            Assignee: Amareshwari Sriramadasu
>         Attachments: patch-1140-1.txt, patch-1140-ydist.txt, patch-1140.txt
>
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message