hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mahadev konar (JIRA)" <j...@apache.org>
Subject [jira] Resolved: (MAPREDUCE-142) Race condition in DistributedCache
Date Fri, 02 Apr 2010 23:09:27 GMT

     [ https://issues.apache.org/jira/browse/MAPREDUCE-142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Mahadev konar resolved MAPREDUCE-142.

    Resolution: Cannot Reproduce

This should be fixed in the current trunk. please reopen if not..

> Race condition in DistributedCache
> ----------------------------------
>                 Key: MAPREDUCE-142
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-142
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>            Reporter: Christian Kunz
>            Assignee: Mahadev konar
> When an older version of a file in DistributedCache exists locally and multiple tasks
per node start, they can run into a race condition:
> dir/mapred/local/taskTracker/archive/subdir/filename is in use and cannot be refreshed
> 	at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:313)
> 	at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:161)
> 	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:134)
> We ran a job with the wrong file, then around 50 minutes later we put the fixed version
into DFS, and ran the same job again. The job had 11,000 maps ~ about 4-5 waves of map tasks
and produced 3,500 failed tasks with above error. We eventually killed it and restarted the
same job again, with no problems this time.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message