hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Devi Kumarappan <kpala...@att.net>
Subject Re: Issue with Hadoop Streaming
Date Thu, 02 Aug 2012 20:03:44 GMT
My mapper is perl script  and it is not in Java.So how do I specify the 

From: Robert Evans <evans@yahoo-inc.com>
To: "mapreduce-user@hadoop.apache.org" <mapreduce-user@hadoop.apache.org>; 
"common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
Sent: Thu, August 2, 2012 12:59:50 PM
Subject: Re: Issue with Hadoop Streaming

It depends on the input format you use.  You probably want to look at using 

From: Devi Kumarappan <kpalania@att.net<mailto:kpalania@att.net>>
Date: Wednesday, August 1, 2012 8:09 PM
To: "common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>" 
Subject: Issue with Hadoop Streaming

I am trying to run hadoop streaming using perl script as the mapper and with no 
reducer. My requirement is for the Mapper  to run on one file at a time.  since 
I have to do pattern processing in the entire contents of one file at a time and 
the file size is small.

Hadoop streaming manual suggests the following solution

*  Generate a file containing the full HDFS path of the input files. Each map 
task would get one file name as input.
*  Create a mapper script which, given a filename, will get the file to local 
disk, gzip the file and put it back in the desired output directory.

I am running the fllowing command.

hadoop jar 
/usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar -input 
/user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl 

/user/devi/file.txt contains the following two lines.


When this runs, instead of spawing two mappers for a.txt and b.txt as per the 
document, only one mapper is being spawned and the perl script gets the 
/user/devi/s_input/a.txt and /user/devi/s_input/b.txt as the inputs.

How could I make the mapper perl script to run using only one file at a time ?

Appreciate your help, Thanks, Devi
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message