Hi Mahmood,
I tried testing a simplified version of this, and it worked in my environment.
One thing I noticed is that your ls output is for directory /in/in, but the job execution
uses /in for the input path. If the input path is a directory, then the assumption is that
it will be a directory consisting solely of files (not sub-directories). This explains the
error message you were seeing. The MapReduce client scanned path /in, found a child entry
named /in/in, and then tried to get HDFS block locations for /in/in. Since it's a directory
instead of a file, attempting to get block locations results in an error. I was able to replicate
the error once I set up my directory structure to match yours.
I suspect this is a mistake in your directory structure, but if it turns out that you actually
want recursive scanning behavior, then there is an option you can set on FileInputFormat to
get that.
http://hadoop.apache.org/docs/r2.6.0/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html#INPUT_DIR_RECURSIVE
The implementation of this feature was tracked in jira issue MAPREDUCE-3193, and you can find
more details there.
https://issues.apache.org/jira/browse/MAPREDUCE-3193
I hope this helps.
Chris Nauroth
Hortonworks
http://hortonworks.com/
From: Mahmood Naderan <nt_mahmood@yahoo.com<mailto:nt_mahmood@yahoo.com>>
Reply-To: Mahmood Naderan <nt_mahmood@yahoo.com<mailto:nt_mahmood@yahoo.com>>
Date: Sunday, April 19, 2015 at 12:28 AM
To: Chris Nauroth <cnauroth@hortonworks.com<mailto:cnauroth@hortonworks.com>>,
User <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Re: Again incompatibility, locating example jars
Thanks that fixed the error however, still it cannot be run like the previous version.
The command is
time ${HADOOP_HOME}/bin/hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar
grep ${WORK_DIR}/data-MicroBenchmarks/in ${WORK_DIR}/data-MicroBenchmarks/out/grep a*xyz
where ${WORK_DIR}=`pwd`
Here is the output
[mahmood@tiger MicroBenchmarks]$ pwd
/home/mahmood/bigdatabench/BigDataBench_V3.1_Hadoop_Hive/MicroBenchmarks
[mahmood@tiger MicroBenchmarks]$ hadoop fs -ls /home/mahmood/bigdatabench/BigDataBench_V3.1_Hadoop_Hive/MicroBenchmarks/data-MicroBenchmarks/in/in
15/04/19 11:56:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 1 mahmood supergroup 524149543 2015-04-19 10:19 /home/mahmood/bigdatabench/BigDataBench_V3.1_Hadoop_Hive/MicroBenchmarks/data-MicroBenchmarks/in/in/lda_wiki1w_1
-rw-r--r-- 1 mahmood supergroup 526316345 2015-04-19 10:19 /home/mahmood/bigdatabench/BigDataBench_V3.1_Hadoop_Hive/MicroBenchmarks/data-MicroBenchmarks/in/in/lda_wiki1w_2
....
15/04/19 11:57:06 INFO mapred.MapTask: Starting flush of map output
15/04/19 11:57:06 INFO mapred.LocalJobRunner: map task executor complete.
15/04/19 11:57:06 WARN mapred.LocalJobRunner: job_local962861772_0001
java.lang.Exception: java.io.FileNotFoundException: Path is not a file: /home/mahmood/bigdatabench/BigDataBench_V3.1_Hadoop_Hive/MicroBenchmarks/data-MicroBenchmarks/in/in
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:70)
at org.apache.hadoop.hdfs.server.namenode.INodeFile.valueOf(INodeFile.java:56)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getBlockLocationsUpdateTimes(FSNamesystem.java:1891)
...
As you can see, it says the path is not valid but the files are actually there.
Regards,
Mahmood
On Sunday, April 19, 2015 10:47 AM, Chris Nauroth <cnauroth@hortonworks.com<mailto:cnauroth@hortonworks.com>>
wrote:
Hello Mahmood,
You want the hadoop-mapreduce-examples-2.6.0.jar file. The grep job (as well as other example
jobs like wordcount) reside in this jar file for the 2.x line of the codebase.
Chris Nauroth
Hortonworks
http://hortonworks.com/
From: Mahmood Naderan <nt_mahmood@yahoo.com<mailto:nt_mahmood@yahoo.com>>
Reply-To: User <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>, Mahmood
Naderan <nt_mahmood@yahoo.com<mailto:nt_mahmood@yahoo.com>>
Date: Saturday, April 18, 2015 at 10:59 PM
To: User <user@hadoop.apache.org<mailto:user@hadoop.apache.org>>
Subject: Again incompatibility, locating example jars
Hi,
There is another incompatibility between 1.2.0 and 2.2.6. I appreciate is someone help to
figure it out.
This command works on 1.2.0
time ${HADOOP_HOME}/bin/hadoop jar ${HADOOP_HOME}/hadoop-examples-*.jar grep ${WORK_DIR}/data-MicroBenchmarks/in
${WORK_DIR}/data-MicroBenchmarks/out/grep a*xyz
But on 2.6.0, I receive this error:
Not a valid JAR: /home/mahmood/bigdatabench/apache/hadoop-2.6.0/hadoop-examples-*.jar
Indeed there is no such file in that folder. So I guess it has been moved to another folder.
However, there are three jar files in 2.6.0
/home/mahmood/bigdatabench/apache/hadoop-2.6.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.6.0-sources.jar
/home/mahmood/bigdatabench/apache/hadoop-2.6.0/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.6.0-test-sources.jar
/home/mahmood/bigdatabench/apache/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar
Which one should I use?
Regards,
Mahmood
|