hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Christian Kunz (JIRA)" <j...@apache.org>
Subject [jira] Created: (HADOOP-2803) Race condition in DistributedCache
Date Sat, 09 Feb 2008 01:27:08 GMT
Race condition in DistributedCache

                 Key: HADOOP-2803
                 URL: https://issues.apache.org/jira/browse/HADOOP-2803
             Project: Hadoop Core
          Issue Type: Bug
          Components: mapred
    Affects Versions: 0.16.0
            Reporter: Christian Kunz

When an older version of a file in DistributedCache exists locally and multiple tasks per
node start, they can run into a race condition:

dir/mapred/local/taskTracker/archive/subdir/filename is in use and cannot be refreshed
	at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:313)
	at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:161)
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:134)

We ran a job with the wrong file, then around 50 minutes later we put the fixed version into
DFS, and ran the same job again. The job had 11,000 maps ~ about 4-5 waves of map tasks and
produced 3,500 failed tasks with above error. We eventually killed it and restarted the same
job again, with no problems this time.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message