hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Boyu Zhang <boyuzhan...@gmail.com>
Subject hadoop streaming using a java program as mapper
Date Wed, 02 May 2012 05:17:20 GMT
Hi All,

I am in a little bit strange situation, I am using Hadoop streaming to run
a bash shell program myMapper.sh, and in the myMapper.sh, it calls a java
program, then a R program, then output intermediate key, values. I used
-file option to ship the java and R files, but the java program was not
executed by the streaming. The myMapper.sh has something like this:

java myJava arguments

And in the streaming command, I use something like this:

hadoop jar /opt/hadoop/hadoop-0.20.2-streaming.jar -D mapred.reduce.tasks=0
-input /user/input -output /user/output7 -mapper ./myMapper.sh -file
myJava.class  -verbose

And the myJava program is not run when I execute like this, and if I go to
the actual slave node to check the files, the myMapper.sh is shipped to the
slave node, but the myJava.class is not, it is inside the job.jar file.

Can someone provide some insights on how to run a java program through
hadoop streaming? Thanks!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message