hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Keith Stevens (Created) (JIRA)" <j...@apache.org>
Subject [jira] [Created] (MAPREDUCE-3276) hadoop dfs -copyToLocal/copyFromLocal called within Hadoop Streaming returns early
Date Wed, 26 Oct 2011 22:05:32 GMT
hadoop dfs -copyToLocal/copyFromLocal called within Hadoop Streaming returns early

                 Key: MAPREDUCE-3276
                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3276
             Project: Hadoop Map/Reduce
          Issue Type: Bug
          Components: contrib/streaming
    Affects Versions: 0.20.2
         Environment: Linux RedHat Enterprise Linux 5.
31 node cluster with 1 as JobTracker and NameNode, and 30 as TaskTracker and DataNode.
            Reporter: Keith Stevens

I'm using the Cloudera hadoop realease 0.20.2.+737 to parallelize bash scripts with Hadoop

Below is an example script that i've been running which simply copies a file from hdfs to
a local node.
 hadoop dfs -copyToLocal /path/to/some/large/file/myFile myFile
 # Spin until the file is fully copied.
 while [ ! -f myFile ]
  echo "spin"
  sleep 1 

Surprisingly, the copy call returns before the file is copied, if the file is sufficiently
large, and the while loop spins for several iterations.  I'm seeing similar behavior with

I've asked about this issue on other forms and no one else seems to have had the problem,
although I don't know how many peoplpe are attempting to do this particular task.

Has this been fixed in more recent versions of hadoop?

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message