hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sri Ramadasu <amar...@yahoo-inc.com>
Subject Re: hadoop-streaming tutorial with -archives option
Date Mon, 22 Feb 2010 03:43:01 GMT
Hi Michael,

There is bug with passing symlink name for -files and -archives options . See MAPREDUCE-787.
If you don't pass any symlink name for the uri in -files and -archives, it creates a symlink
with actual name.
So, if you pass -archives "hdfs://localhost:9000/user/me/samples/cachefile/cachedir.jar",
a symlink with name cachedir.jar will be created.

-files and -archives are Generic options. For all commands, generic options should be followed
by command options.
The above documentation is corrected in MAPREDUCE-813.


On 2/20/10 9:57 AM, "Michael Kintzer" <michael.kintzer@zerk.com> wrote:

> Hi,
> Hadoop/HDFS newbie.  Been struggling with getting the streaming example working with
-archives.   c.f.  http://hadoop.apache.org/common/docs/r0.20.1/streaming.html#Large+files+and+archives+in+Hadoop+Streaming
> My environment is the Pseudo-distributed environment setup per: http://hadoop.apache.org/common/docs/current/quickstart.html#PseudoDistributed
> I've run into a couple issues.   First issue is "FileNotFoundException" when the #symlink
suffix is specified with the -archives or -files options as per the tutorial.
> hadoop jar $HADOOP_HOME/hadoop-0.20.1-streaming.jar -archives "hdfs://localhost:9000/user/me/samples/cachefile/cachedir.jar#testlink"
-input "samples/cachefile/input.txt" -mapper "xargs cat" -reducer "cat" -output "samples/cachefile/out"
> java.io.FileNotFoundException: File hdfs://localhost:9000/user/me/samples/cachefile/cachedir.jar#testlink
does not exist.
>       at org.apache.hadoop.util.GenericOptionsParser.validateFiles(GenericOptionsParser.java:349)
>       at org.apache.hadoop.util.GenericOptionsParser.processGeneralOptions(GenericOptionsParser.java:275)
>       at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:375)
>       at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
>       at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:138)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:59)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
>       at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:32)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>       at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>       at java.lang.reflect.Method.invoke(Method.java:597)
>       at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> If I remove the "#testlink" from the archives definition, the error goes away but the
symlink is not created, as per the tutorial documentation.
> I've seen this JIRA issue http://issues.apache.org/jira/browse/HADOOP-6178, shows no
FIX version, but the Issue Links to others which are supposedly fixed in 0.20.1 which I have.
> 2nd issue is "Unrecognized option -archives" when -archives is specified at the end of
the arg list.
> hadoop jar $HADOOP_HOME/hadoop/hadoop-0.20.1-streaming.jar -input "samples/cachefile/input.txt"
-mapper "xargs cat" -reducer "cat" -output "samples/cachefile/out9" -archives "hdfs://localhost:9000/user/me/samples/cachefile/cachedir.jar#testlink"
> 10/02/19 14:29:11 ERROR streaming.StreamJob: Unrecognized option: -archives
> Any help getting past this appreciated.    Am I missing a configuration setting that
allows symlinking?  Really hoping to use the archives feature.
> -Michael

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message