hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dingcheng Li <dingche...@gmail.com>
Subject Re: Help on perl streaming
Date Sun, 06 Dec 2015 23:21:46 GMT
Without it, it works well after I comment the script to create and read the
resource file. For python, exactly the same file structure, it works. I do
not think that the resource file ("salesData/salesFilter.txt") should be in
HDFS directory since the resource file is like a dictionary which I use to
filter words from the input file.
Thanks,
Dingcheng

On Sun, Dec 6, 2015 at 5:13 PM, David Morel <dmorel@amakuru.net> wrote:

> Your file would probably not be located in a subdirectory in HDFS. Try
> without it ?
> Le 6 déc. 2015 10:46 PM, "Dingcheng Li" <dingchengl@gmail.com> a écrit :
>
>> Hi, folks,
>>
>> I am using hadoop streaming to call perl scripts as mapper. Things are
>> working well. But I found that the resource file reading is a problem.
>>
>> Basically I think that I am on the right track, -file option is the
>> correct way to get resource file read. I tested on python script. But for
>> perl, it always gives the file not found error. I noticed that in python
>> “import sys” is sued. I am not sure what is needed for perl. I have a
>> simple test code as follows (use Sys not working),
>>
>>
>> #!/usr/bin/perl
>>
>> my $filter_file = "salesData/salesFilter.txt";
>>
>> open(FH, $filter_file) or die "Could not open file '$filter_file' $!";
>>
>> #my $filename = $0;
>>
>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>
>>  # or die "Could not open file '$filename' $!";
>>
>>
>> #my $filename = $ENV{"map_input_file"};
>>
>> my $filename = $ENV{"mapreduce_map_input_file"};
>>
>> #mapreduce_map_input_file
>>
>> print STDERR "Input filename is: $filename\n";
>>
>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>
>>  # or die "Could not open file '$filename' $!";
>>
>> #foreach(<$fh>)
>>
>> foreach(<>)
>>
>> {
>>
>>  chomp;
>>
>>  #open(FILEHANDLE,"out/sales-out/outfile.txt");
>>
>>  ($store,$sale) = (split(/\s+/,$_))[2,4];
>>
>>  print "$store\t$sale\n";
>>
>>  #print "{0}\t{1}".format($store,$sale);
>>
>> }
>>
>> And the command for it is,
>>
>>
>> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input
>> salesData/sales.txt -output out/sales-out -mapper
>> perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer
>> perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file
>> salesData/salesFilter.txt
>>
>>
>> May you guys give suggestions?
>>
>>
>> Thanks,
>>
>> Dingcheng
>>
>

Mime
View raw message