Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C035618880 for ; Sun, 6 Dec 2015 21:46:21 +0000 (UTC) Received: (qmail 21630 invoked by uid 500); 6 Dec 2015 21:46:17 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 21435 invoked by uid 500); 6 Dec 2015 21:46:17 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 21424 invoked by uid 99); 6 Dec 2015 21:46:16 -0000 Received: from Unknown (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 06 Dec 2015 21:46:16 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 79A8E1A33F5 for ; Sun, 6 Dec 2015 21:46:16 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.9 X-Spam-Level: ** X-Spam-Status: No, score=2.9 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=3, SPF_PASS=-0.001, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-us-west.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id cvSbEnOSY1DQ for ; Sun, 6 Dec 2015 21:46:06 +0000 (UTC) Received: from mail-ob0-f177.google.com (mail-ob0-f177.google.com [209.85.214.177]) by mx1-us-west.apache.org (ASF Mail Server at mx1-us-west.apache.org) with ESMTPS id 62AE02059B for ; Sun, 6 Dec 2015 21:46:06 +0000 (UTC) Received: by obbnk6 with SMTP id nk6so103800389obb.2 for ; Sun, 06 Dec 2015 13:46:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=NeaCkykcumSIIogUqOBp8zDW+RLBe9Z4xERGs3Nr1i0=; b=LZ4K0vwcqagcUagZCzSs/JjAZ73NYCfB6J9TZ/MJI83ouY5p2RL4CEnxIXrWeRYR80 aR1kWXPBKXMLzKXzPwRRHHwG8OonxcV+Q+dXld9vqUE2x3bmxlRN0Snfg/L8BUZ86ivl JqfkJWr98IOjomYcRth+jnNQf0DYyVDovZLbvjhx+apSlnhixk2yFjTHDmL/pTlxKE6j qk+ipGuSyDW8s0bFpD23jqFvCWKHEaxC6RehTg3ecAOQlzej7K2416JZDT9ZvCnRGTru f3kDE7lwtoOaa6BLEMaGlY72NRHMQAkSvawrqLnh8zcaMwrg4Jfn57zKRpGp4/6HhSUN b1oA== MIME-Version: 1.0 X-Received: by 10.182.89.227 with SMTP id br3mr16880899obb.56.1449438360149; Sun, 06 Dec 2015 13:46:00 -0800 (PST) Received: by 10.202.207.144 with HTTP; Sun, 6 Dec 2015 13:46:00 -0800 (PST) Date: Sun, 6 Dec 2015 15:46:00 -0600 Message-ID: Subject: Help on perl streaming From: Dingcheng Li To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=089e013a097294a697052641aa2f --089e013a097294a697052641aa2f Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi, folks, I am using hadoop streaming to call perl scripts as mapper. Things are working well. But I found that the resource file reading is a problem. Basically I think that I am on the right track, -file option is the correct way to get resource file read. I tested on python script. But for perl, it always gives the file not found error. I noticed that in python =E2=80=9Cim= port sys=E2=80=9D is sued. I am not sure what is needed for perl. I have a simpl= e test code as follows (use Sys not working), #!/usr/bin/perl my $filter_file =3D "salesData/salesFilter.txt"; open(FH, $filter_file) or die "Could not open file '$filter_file' $!"; #my $filename =3D $0; #open(my $fh, '<:encoding(UTF-8)', $filename) # or die "Could not open file '$filename' $!"; #my $filename =3D $ENV{"map_input_file"}; my $filename =3D $ENV{"mapreduce_map_input_file"}; #mapreduce_map_input_file print STDERR "Input filename is: $filename\n"; #open(my $fh, '<:encoding(UTF-8)', $filename) # or die "Could not open file '$filename' $!"; #foreach(<$fh>) foreach(<>) { chomp; #open(FILEHANDLE,"out/sales-out/outfile.txt"); ($store,$sale) =3D (split(/\s+/,$_))[2,4]; print "$store\t$sale\n"; #print "{0}\t{1}".format($store,$sale); } And the command for it is, hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input salesData/sales.txt -output out/sales-out -mapper perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file salesData/salesFilter.txt May you guys give suggestions? Thanks, Dingcheng --089e013a097294a697052641aa2f Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable

Hi, folks,

I am using hadoop streaming to call perl scripts as mapper. T= hings are working well. But I found that the resource file reading is a pro= blem.=C2=A0

Basically I think that I am o= n the right track, -file option is the correct way to get resource file rea= d. I tested on python script. But for perl, it always gives the file not fo= und error. I noticed that in python =E2=80=9Cimport sys=E2=80=9D is sued. I= am not sure what is needed for perl. I have a simple test code as follows = (use Sys not working),=C2=A0


#!/usr/bin/perl

my $filter_file =3D "salesData/salesFil= ter.txt";

open(FH, $filter_file) or die "Could no= t open file '$filter_file' $!";

#my $filename =3D $0;

#open(my $fh, '<:encoding(UTF-8)'= , $filename)

=C2=A0# or die "Could not open file = 9;$filename' $!";


#my $filename =3D $ENV{"map_input_file&= quot;};

my $filename =3D $ENV{"mapreduce_map_in= put_file"};

#mapreduce_map_input_file

print STDERR "Input filename is: $filen= ame\n";

#open(my $fh, '<:encoding(UTF-8)'= , $filename)

=C2=A0# or die "Could not open file = 9;$filename' $!";

#foreach(<$fh>)

foreach(<>)

{

=C2=A0chomp;

=C2=A0#open(FILEHANDLE,"out/sales-out/o= utfile.txt");

=C2=A0($store,$sale) =3D (split(/\s+/,$_))[2= ,4];

=C2=A0print "$store\t$sale\n";

=C2=A0#print "{0}\t{1}".format($st= ore,$sale);

}

And the command for it is,


hadoop jar /usr/lib/hadoop-mapreduce/hadoop-= streaming.jar -input salesData/sales.txt -output out/sales-out -mapper perl= Scripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer perlScript= s/salesReducer.pl -file perlScripts/salesReducer.pl -file salesData/salesFi= lter.txt


May you guys give suggestions?


Thanks,

Dingcheng

--089e013a097294a697052641aa2f--