[ https://issues.apache.org/jira/browse/MAPREDUCE-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Allen Wittenauer resolved MAPREDUCE-3276.
-----------------------------------------
Resolution: Later
This issue is pretty stale at this point. Closing with later. if it is still a problem,
then please open a new jira.
> hadoop dfs -copyToLocal/copyFromLocal called within Hadoop Streaming returns early
> ----------------------------------------------------------------------------------
>
> Key: MAPREDUCE-3276
> URL: https://issues.apache.org/jira/browse/MAPREDUCE-3276
> Project: Hadoop Map/Reduce
> Issue Type: Bug
> Components: contrib/streaming
> Affects Versions: 0.20.2
> Environment: Linux RedHat Enterprise Linux 5.
> 31 node cluster with 1 as JobTracker and NameNode, and 30 as TaskTracker and DataNode.
> Reporter: Keith Stevens
> Labels: hadoop, shell, streaming
>
> I'm using the Cloudera hadoop realease 0.20.2.+737 to parallelize bash scripts with Hadoop
Streaming.
> Below is an example script that i've been running which simply copies a file from hdfs
to a local node.
> {code:title=SampleMapper.sh|borderStyle=solid}
> hadoop dfs -copyToLocal /path/to/some/large/file/myFile myFile
> # Spin until the file is fully copied.
> while [ ! -f myFile ]
> do
> echo "spin"
> sleep 1
> done
> {code}
> Surprisingly, the copy call returns before the file is copied, if the file is sufficiently
large, and the while loop spins for several iterations. I'm seeing similar behavior with
copyFromLocal.
> I've asked about this issue on other forms and no one else seems to have had the problem,
although I don't know how many peoplpe are attempting to do this particular task.
> Has this been fixed in more recent versions of hadoop?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
|