hadoop-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dingcheng Li <dingche...@gmail.com>
Subject Help on perl streaming
Date Sun, 06 Dec 2015 21:46:00 GMT
Hi, folks,

I am using hadoop streaming to call perl scripts as mapper. Things are
working well. But I found that the resource file reading is a problem.

Basically I think that I am on the right track, -file option is the correct
way to get resource file read. I tested on python script. But for perl, it
always gives the file not found error. I noticed that in python “import
sys” is sued. I am not sure what is needed for perl. I have a simple test
code as follows (use Sys not working),


#!/usr/bin/perl

my $filter_file = "salesData/salesFilter.txt";

open(FH, $filter_file) or die "Could not open file '$filter_file' $!";

#my $filename = $0;

#open(my $fh, '<:encoding(UTF-8)', $filename)

 # or die "Could not open file '$filename' $!";


#my $filename = $ENV{"map_input_file"};

my $filename = $ENV{"mapreduce_map_input_file"};

#mapreduce_map_input_file

print STDERR "Input filename is: $filename\n";

#open(my $fh, '<:encoding(UTF-8)', $filename)

 # or die "Could not open file '$filename' $!";

#foreach(<$fh>)

foreach(<>)

{

 chomp;

 #open(FILEHANDLE,"out/sales-out/outfile.txt");

 ($store,$sale) = (split(/\s+/,$_))[2,4];

 print "$store\t$sale\n";

 #print "{0}\t{1}".format($store,$sale);

}

And the command for it is,


hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input
salesData/sales.txt -output out/sales-out -mapper
perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer
perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file
salesData/salesFilter.txt


May you guys give suggestions?


Thanks,

Dingcheng

Mime
View raw message