Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 7A39C9240 for ; Thu, 2 Aug 2012 20:04:21 +0000 (UTC) Received: (qmail 30181 invoked by uid 500); 2 Aug 2012 20:04:15 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 30086 invoked by uid 500); 2 Aug 2012 20:04:15 -0000 Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-user@hadoop.apache.org Delivered-To: mailing list mapreduce-user@hadoop.apache.org Received: (qmail 29882 invoked by uid 99); 2 Aug 2012 20:04:15 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 02 Aug 2012 20:04:15 +0000 X-ASF-Spam-Status: No, hits=2.9 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [98.139.44.170] (HELO nm18-vm0.access.bullet.mail.sp2.yahoo.com) (98.139.44.170) by apache.org (qpsmtpd/0.29) with SMTP; Thu, 02 Aug 2012 20:04:07 +0000 Received: from [98.139.44.100] by nm18.access.bullet.mail.sp2.yahoo.com with NNFMP; 02 Aug 2012 20:03:45 -0000 Received: from [98.139.44.75] by tm5.access.bullet.mail.sp2.yahoo.com with NNFMP; 02 Aug 2012 20:03:45 -0000 Received: from [127.0.0.1] by omp1012.access.mail.sp2.yahoo.com with NNFMP; 02 Aug 2012 20:03:45 -0000 X-Yahoo-Newman-Property: ymail-3 X-Yahoo-Newman-Id: 331544.59345.bm@omp1012.access.mail.sp2.yahoo.com Received: (qmail 12810 invoked by uid 60001); 2 Aug 2012 20:03:44 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=att.net; s=s1024; t=1343937824; bh=Z0rmXq8608qt0/edFevydEBGc+8mSL7of1ksnpBlB4s=; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=p2LYsGXFkIjFt1MjNxNWFQxdxHxfHvA8d1JD8HyYwJrYYf14LbgJmQ3l3LdLvgajk3w7ZWCXoTHyW5HaPbHUky02ggOd5PkWBDdaaK6YC0dJCb3n1rhBrEggIC/WZU+/vz8bBK98ZakMdn7KH1yjlsMxhTCjtm79y2ciJfTPu3s= DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=att.net; h=X-YMail-OSG:Received:X-Mailer:References:Message-ID:Date:From:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=B+YMUl7AGD0sSC7No9lwm0XfqM734khkFiAsyleXqczq8rHtK5XrmNx2jinKAkIZnI2ZyGF6c8qdqcp+AzkJK69NQ/m8o4a7kuU6/k8ZZy5ZxIqdTDwclOQVntBcDO9cC7CT9M5e/SCbPxvjobVvNEaGRqX+24r3/Gem1q6Mb3Y=; X-YMail-OSG: TAViszsVM1njMEEjktoyhErer4CR2AfKDH0vEJDHyJYW.bP l3uNBpc4FBhTPPKU0KeVYJBvq84otafujUtYZxMfPtqgFMGDp9eBRaNZBGxz Ht9RYONAdAUQ7hw1JjzBschj1M4CKtkBQ5c9EQVrIQHBKPLrYoQvKSkOAQfv AQCQufx2Z664SuZ_jONBhhhK6BccjspjcK.ZH85jPgqpIl6LbRvWctPc_nzT FyW0._Frd4x4O2joPnvPyl1fHoIGT3YsIQFfNQk9VzWNQ_n7iGLefrb1In51 dCxzxb3BZdJvv9jAItc8Y_z8ASCSCqpEiAmV3GI9UbrcZJNkPKW6oBuOjVFc OiEfDZU85P_BORVap2IRWt0L88trM0VHXkys7PrGzTQrjfV_KrgGLM7h_8Dr 0L_oHNPwgV_1YIQI.QQTQLzJwB5k4OFuUJWv5UfsX70mDMJEAvYsfrWyP7Vi D9Gw1rf9ebpvxLoq3N52jSwwGv7VzWutJgMKD6DJKsiKsCweL7jHK0oYjog- - Received: from [158.140.1.28] by web180211.mail.gq1.yahoo.com via HTTP; Thu, 02 Aug 2012 13:03:44 PDT X-Mailer: YahooMailRC/708 YahooMailWebService/0.8.120.356233 References: Message-ID: <1343937824.9798.YahooMailRC@web180211.mail.gq1.yahoo.com> Date: Thu, 2 Aug 2012 13:03:44 -0700 (PDT) From: Devi Kumarappan Subject: Re: Issue with Hadoop Streaming To: common-user@hadoop.apache.org, "mapreduce-user@hadoop.apache.org" In-Reply-To: MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="-1952832855-555619520-1343937824=:9798" ---1952832855-555619520-1343937824=:9798 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable My mapper is perl script=A0 and it is not in Java.So how do I specify the = =0ANLineFormat?=0A=0A=0A=0A=0A________________________________=0AFrom: Robe= rt Evans =0ATo: "mapreduce-user@hadoop.apache.org" ; =0A"common-user@hadoop.apache.org" =0ASent: Thu, August 2, 2012 12:59:50 PM=0ASubject:= Re: Issue with Hadoop Streaming=0A=0AIt depends on the input format you us= e.=A0 You probably want to look at using =0ANLineInputFormat=0A=0AFrom: Dev= i Kumarappan >=0AReply-To: =0A"ma= preduce-user@hadoop.apache.org" = =0A>=0ADate: Wednesday, August 1, 2012 8:09 PM=0ATo: "common-user@hadoop.apa= che.org" =0A>, =0A"mapreduce-user@hadoop.apa= che.org" =0A>=0ASubject: Issue with= Hadoop Streaming=0A=0AI am trying to run hadoop streaming using perl scrip= t as the mapper and with no =0Areducer. My requirement is for the Mapper=A0= to run on one file at a time.=A0 since =0AI have to do pattern processing = in the entire contents of one file at a time and =0Athe file size is small.= =0A=0AHadoop streaming manual suggests the following solution=0A=0A*=A0 Gen= erate a file containing the full HDFS path of the input files. Each map =0A= task would get one file name as input.=0A*=A0 Create a mapper script which,= given a filename, will get the file to local =0Adisk, gzip the file and pu= t it back in the desired output directory.=0A=0AI am running the fllowing c= ommand.=0A=0Ahadoop jar =0A/usr/lib/hadoop-0.20/contrib/streaming/hadoop-st= reaming-0.20.2-cdh3u3.jar -input =0A/user/devi/file.txt -output /user/devi/= s_output -mapper "/usr/bin/perl =0A/home/devi/Perl/crash_parser.pl"=0A=0A= =0A=0A/user/devi/file.txt contains the following two lines.=0A=0A/user/devi= /s_input/a.txt=0A/user/devi/s_input/b.txt=0A=0AWhen this runs, instead of s= pawing two mappers for a.txt and b.txt as per the =0Adocument, only one map= per is being spawned and the perl script gets the =0A/user/devi/s_input/a.t= xt and /user/devi/s_input/b.txt as the inputs.=0A=0A=0A=0AHow could I make = the mapper perl script to run using only one file at a time ?=0A=0A=0A=0AAp= preciate your help, Thanks, Devi ---1952832855-555619520-1343937824=:9798 Content-Type: text/html; charset=iso-8859-1 Content-Transfer-Encoding: quoted-printable
My mapper is perl script  and it is not in Java.S= o how do I specify the NLineFormat?
=0A

=0A
=0A
=0A= From: Robert Evans <evans@yahoo-inc.com>
To: "mapreduce-user@hadoop.apache.org" &l= t;mapreduce-user@hadoop.apache.org>; "common-user@hadoop.apache.org" <= ;common-user@hadoop.apache.org>
= Sent: Thu, August 2, 2012 12:59:50 PM
Subject: Re: Issue with Hadoop Streaming

It depends on the input format you use.  You probably want to loo= k at using NLineInputFormat

From: Devi Kumarappan <kpalania@att.net<= /A><mailto:kpalania@att.net>>
Reply-To: "mapreduce-user@hadoop.= apache.org<mailto:mapreduce-user@hadoop= .apache.org>" <mapreduce-user@hadoop= .apache.org<mailto:mapreduce-user@hadoo= p.apache.org>>
Date: Wednesday, August 1, 2012 8:09 PM
To: = "common-user@hadoop.apache.org<mailto:common-user@hadoop.apache.org>" <common-user@hadoop.apache= .org<mailto:common-user@hadoop.apache.org= >>, "mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org>" <mapreduce-user@hadoop.apache.org<mailto:mapreduce-user@hadoop.apache.org<= /A>>>
Subject: Issue with Hadoop Streaming

I am trying to r= un hadoop streaming using perl script as the mapper and with no reducer. My= requirement is for the Mapper  to run on one file at a time.  since I have to d= o pattern processing in the entire contents of one file at a time and the f= ile size is small.

Hadoop streaming manual suggests the following so= lution

*  Generate a file containing the full HDFS path of the = input files. Each map task would get one file name as input.
*  Cre= ate a mapper script which, given a filename, will get the file to local dis= k, gzip the file and put it back in the desired output directory.

I = am running the fllowing command.

hadoop jar /usr/lib/hadoop-0.20/con= trib/streaming/hadoop-streaming-0.20.2-cdh3u3.jar -input /user/devi/file.tx= t -output /user/devi/s_output -mapper "/usr/bin/perl /home/devi/Perl/crash_= parser.pl"



/user/devi/file.txt contains the following two li= nes.

/user/devi/s_input/a.txt
/user/devi/s_input/b.txt

Whe= n this runs, instead of spawing two mappers for a.txt and b.txt as per the document, only one mapper is being spawned and the perl script get= s the /user/devi/s_input/a.txt and /user/devi/s_input/b.txt as the inputs.<= BR>


How could I make the mapper perl script to run using only on= e file at a time ?



Appreciate your help, Thanks, Devi



---1952832855-555619520-1343937824=:9798--