hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Issue with Hadoop Streaming
Date Thu, 02 Aug 2012 19:59:00 GMT
It depends on the input format you use.  You probably want to look at using NLineInputFormat

From: Devi Kumarappan <kpalania@att.net<mailto:kpalania@att.net>>
Reply-To: "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>"
Date: Wednesday, August 1, 2012 8:09 PM
To: "common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>" <common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>>,
"mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>" <mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>
Subject: Issue with Hadoop Streaming

I am trying to run hadoop streaming using perl script as the mapper and with no reducer. My
requirement is for the Mapper  to run on one file at a time.  since I have to do pattern processing
in the entire contents of one file at a time and the file size is small.

Hadoop streaming manual suggests the following solution

 *   Generate a file containing the full HDFS path of the input files. Each map task would
get one file name as input.
 *   Create a mapper script which, given a filename, will get the file to local disk, gzip
the file and put it back in the desired output directory.

I am running the fllowing command.

hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar -input
/user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl /home/devi/Perl/crash_parser.pl"

/user/devi/file.txt contains the following two lines.


When this runs, instead of spawing two mappers for a.txt and b.txt as per the document, only
one mapper is being spawned and the perl script gets the /user/devi/s_input/a.txt and /user/devi/s_input/b.txt
as the inputs.

How could I make the mapper perl script to run using only one file at a time ?

Appreciate your help, Thanks, Devi

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message