hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dingcheng Li <dingche...@gmail.com>
Subject Re: Help on perl streaming
Date Mon, 07 Dec 2015 16:23:18 GMT
Thanks for your quick response. It seems to make sense that I should put
the resource file and script into the same directory. Sigh, I cannot test
it now since our hadoop environment is down for maintenance this week. I
will keep you posted if this will work.

Thanks a lot,
Dingcheng

On Sun, Dec 6, 2015 at 6:12 PM, David Morel <dmorel@amakuru.net> wrote:

> On 7 Dec 2015, at 0:21, Dingcheng Li wrote:
>
> Without it, it works well after I comment the script to create and read the
>> resource file. For python, exactly the same file structure, it works. I do
>> not think that the resource file ("salesData/salesFilter.txt") should be
>> in
>> HDFS directory since the resource file is like a dictionary which I use to
>> filter words from the input file.
>> Thanks,
>> Dingcheng
>>
>
> So I checked the docs and the file is copied to the workdir, but not in a
> subdir, like I said.
> So removing the subdirectories in your perl script should work, as the
> data file should be in the same directory as your script.
>
> example taken from https://wiki.apache.org/hadoop/HadoopStreaming
>
> Example: $HSTREAMING -mapper "/usr/local/bin/perl5 filter.pl"
>            -file /local/filter.pl -input "/logs/0604*/*" [...]
>   Ships a script, invokes the non-shipped perl interpreter
>   Shipped files go to the working directory so filter.pl is found by perl
> ...
>
>
>
>> On Sun, Dec 6, 2015 at 5:13 PM, David Morel <dmorel@amakuru.net> wrote:
>>
>> Your file would probably not be located in a subdirectory in HDFS. Try
>>> without it ?
>>> Le 6 déc. 2015 10:46 PM, "Dingcheng Li" <dingchengl@gmail.com> a écrit
:
>>>
>>> Hi, folks,
>>>>
>>>> I am using hadoop streaming to call perl scripts as mapper. Things are
>>>> working well. But I found that the resource file reading is a problem.
>>>>
>>>> Basically I think that I am on the right track, -file option is the
>>>> correct way to get resource file read. I tested on python script. But
>>>> for
>>>> perl, it always gives the file not found error. I noticed that in python
>>>> “import sys” is sued. I am not sure what is needed for perl. I have a
>>>> simple test code as follows (use Sys not working),
>>>>
>>>>
>>>> #!/usr/bin/perl
>>>>
>>>> my $filter_file = "salesData/salesFilter.txt";
>>>>
>>>> open(FH, $filter_file) or die "Could not open file '$filter_file' $!";
>>>>
>>>> #my $filename = $0;
>>>>
>>>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>>>
>>>> # or die "Could not open file '$filename' $!";
>>>>
>>>>
>>>> #my $filename = $ENV{"map_input_file"};
>>>>
>>>> my $filename = $ENV{"mapreduce_map_input_file"};
>>>>
>>>> #mapreduce_map_input_file
>>>>
>>>> print STDERR "Input filename is: $filename\n";
>>>>
>>>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>>>
>>>> # or die "Could not open file '$filename' $!";
>>>>
>>>> #foreach(<$fh>)
>>>>
>>>> foreach(<>)
>>>>
>>>> {
>>>>
>>>> chomp;
>>>>
>>>> #open(FILEHANDLE,"out/sales-out/outfile.txt");
>>>>
>>>> ($store,$sale) = (split(/\s+/,$_))[2,4];
>>>>
>>>> print "$store\t$sale\n";
>>>>
>>>> #print "{0}\t{1}".format($store,$sale);
>>>>
>>>> }
>>>>
>>>> And the command for it is,
>>>>
>>>>
>>>> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input
>>>> salesData/sales.txt -output out/sales-out -mapper
>>>> perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer
>>>> perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file
>>>> salesData/salesFilter.txt
>>>>
>>>>
>>>> May you guys give suggestions?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Dingcheng
>>>>
>>>>
>>>

Mime
View raw message