hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Chansler (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-2803) Race condition in DistributedCache
Date Tue, 25 Mar 2008 03:03:27 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-2803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Robert Chansler updated HADOOP-2803:

    Fix Version/s:     (was: 0.17.0)

> Race condition in DistributedCache
> ----------------------------------
>                 Key: HADOOP-2803
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2803
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.16.0
>            Reporter: Christian Kunz
>            Assignee: Mahadev konar
> When an older version of a file in DistributedCache exists locally and multiple tasks
per node start, they can run into a race condition:
> dir/mapred/local/taskTracker/archive/subdir/filename is in use and cannot be refreshed
> 	at org.apache.hadoop.filecache.DistributedCache.localizeCache(DistributedCache.java:313)
> 	at org.apache.hadoop.filecache.DistributedCache.getLocalCache(DistributedCache.java:161)
> 	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:134)
> We ran a job with the wrong file, then around 50 minutes later we put the fixed version
into DFS, and ran the same job again. The job had 11,000 maps ~ about 4-5 waves of map tasks
and produced 3,500 failed tasks with above error. We eventually killed it and restarted the
same job again, with no problems this time.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message