hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Issue with Hadoop Streaming
Date Thu, 02 Aug 2012 20:08:59 GMT
http://www.mail-archive.com/core-user@hadoop.apache.org/msg07382.html




From: Devi Kumarappan <kpalania@att.net<mailto:kpalania@att.net>>
Reply-To: "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>"
<mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>
Date: Thursday, August 2, 2012 3:03 PM
To: "common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>" <common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>>,
"mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>" <mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>
Subject: Re: Issue with Hadoop Streaming

My mapper is perl script  and it is not in Java.So how do I specify the NLineFormat?

________________________________
From: Robert Evans <evans@yahoo-inc.com<mailto:evans@yahoo-inc.com>>
To: "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>" <mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>;
"common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>" <common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>>
Sent: Thu, August 2, 2012 12:59:50 PM
Subject: Re: Issue with Hadoop Streaming

It depends on the input format you use.  You probably want to look at using NLineInputFormat

From: Devi Kumarappan <kpalania@att.net<mailto:kpalania@att.net><mailto:kpalania@att.net<mailto:kpalania@att.net>>>
Reply-To: "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org><mailto:mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>"
<mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org><mailto:mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>>
Date: Wednesday, August 1, 2012 8:09 PM
To: "common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org><mailto:common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>>"
<common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org><mailto:common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>>>,
"mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org><mailto:mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>"
<mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org><mailto:mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>>>
Subject: Issue with Hadoop Streaming

I am trying to run hadoop streaming using perl script as the mapper and with no reducer. My
requirement is for the Mapper  to run on one file at a time.  since I have to do pattern processing
in the entire contents of one file at a time and the file size is small.

Hadoop streaming manual suggests the following solution

*  Generate a file containing the full HDFS path of the input files. Each map task would get
one file name as input.
*  Create a mapper script which, given a filename, will get the file to local disk, gzip the
file and put it back in the desired output directory.

I am running the fllowing command.

hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar -input
/user/devi/file.txt -output /user/devi/s_output -mapper "/usr/bin/perl /home/devi/Perl/crash_parser.pl"



/user/devi/file.txt contains the following two lines.

/user/devi/s_input/a.txt
/user/devi/s_input/b.txt

When this runs, instead of spawing two mappers for a.txt and b.txt as per the document, only
one mapper is being spawned and the perl script gets the /user/devi/s_input/a.txt and /user/devi/s_input/b.txt
as the inputs.



How could I make the mapper perl script to run using only one file at a time ?



Appreciate your help, Thanks, Devi





Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message