Return-Path: X-Original-To: apmail-hadoop-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3F2C8F233 for ; Thu, 9 May 2013 07:46:17 +0000 (UTC) Received: (qmail 35909 invoked by uid 500); 9 May 2013 07:46:12 -0000 Delivered-To: apmail-hadoop-user-archive@hadoop.apache.org Received: (qmail 35809 invoked by uid 500); 9 May 2013 07:46:11 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Delivered-To: moderator for user@hadoop.apache.org Received: (qmail 91158 invoked by uid 99); 9 May 2013 07:25:51 -0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW X-Spam-Check-By: apache.org Received-SPF: error (nike.apache.org: local policy) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:content-type:x-gm-message-state; bh=83Ah2WT8JBXgBaTq9NRTpCoZuEXD7ajAT0Jc8rtUlZc=; b=U4zZ/4TMcEcSI5xHfk6VxDdQhBkfgZ7Ef2LKgXvUBHsEvSvPPX6C2FbJ0247Nwd48b mjYw5iFMxoSlSsB2+geQKsGUYudhSTz6fcCBJuRbYq1OiZJMsrMAbaaKlxW64lt/n/sv WY2yn4csdo/q/+gZB54OisAqmRH7PYtWOteB+VTqQgod4KYUh8l9zfhxg5eWWGnrs2vA DuNuY/8FZ8dvuZ8fptiJWsolSt8C7IQyKhV1FjW50sJDz7wDyBiWg1hvtRugDB+WYAJR YVli56E9LxdnIh3I7d9Yf5I7IBImjCeW8d8WUNd+CIC1frOekK9fC7Om6Dxyt+EjFS/K yk3g== MIME-Version: 1.0 X-Received: by 10.195.13.75 with SMTP id ew11mr15793666wjd.25.1368084304851; Thu, 09 May 2013 00:25:04 -0700 (PDT) In-Reply-To: References: Date: Thu, 9 May 2013 15:25:04 +0800 Message-ID: Subject: Re: one new bie question From: Ted Xu To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=047d7bd9202a0386c704dc43f3ae X-Gm-Message-State: ALoCoQlr9grM3dbjHEoBMFXP2mD0PvDYhBIcQmA/LWcdvw/cXpekiFAu45sftu8Rm1BsPDDBnN+T X-Virus-Checked: Checked by ClamAV on apache.org --047d7bd9202a0386c704dc43f3ae Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Hi Balson, Have you tried NLineInputFormat? You can find example of NLineInputFormat here: http://goo.gl/aVzDr. On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A. wrote: > > Hello > > I would like to see the possibility of using map reduce framework for my > following problem. > > I have a set of huge files. I would like to execute a binary over every > input files. The binary needs to operate over the whole file and hence it > is not possible to split the file in chunks. Let=92s assume that I have s= ix > such files and have their names in a single text file. I need to write > hadoop code to take this single file as input and every line in it should > go to one map task. The map tasks shall execute the binary on this file a= nd > the file can be located in hdfs. No reduce tasks is needed and no output > shall be emitted from the map tasks as well. The binary take care of > creating output file in the specified location. > Is there a way to tell hadoop to feed single line to a map task? I came > across few examples wherein a set of files has been given and looks like > the framework try to split the file, reads every line in the split, > generates key/value pairs and send this pairs to single map task. In my > situation, I want only one key value pair should be generated for one lin= e > and it should be given to a single map task. Thats it? > > For ex. Assume that this is my file > > myFirstInput.vlc > mySecondInput.vlc > myThirdInput.vlc > > Now, first map task should get a pair <1, myFirstInput.vlc>, the second > gets a pair <2, mySecondInput.vlc> and so on. > > Can someone throw some light in to this problem? For me, it looks > straightforward but could not find any pointers in the web. > > > > > > > > With thanks and regards > Balson > > > > --=20 Regards, Ted Xu --047d7bd9202a0386c704dc43f3ae Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
Hi Balson,

Have you tried=A0NLineInputFormat? You can find example of NLin= eInputFormat here:=A0http://goo.gl/aVzDr.


On Thu,= May 9, 2013 at 2:53 PM, Balachandar R.A. <balachandar.ra@gmail.com= > wrote:


Hello

I would like to see the possibility of using map reduce framework for my= following problem.
=A0
I have a set of huge files. I would like to execute a binary over every inp= ut files. The binary needs to operate over the whole file and hence it is n= ot possible to split the file in chunks. Let=92s assume that I have six suc= h files and have their names in a single text file. I need to write hadoop = code to take this single file as input and every line in it should go to on= e map task. The map tasks shall execute the binary on this file and the fil= e can be located in hdfs. No reduce tasks is needed and no output shall be = emitted from the map tasks as well. The binary take care of creating output= file in the specified location.
Is there a way to tell hadoop to feed single line to a map task? I came acr= oss few examples wherein a set of files has been given and looks like the f= ramework try to split the file, reads every line in the split, generates ke= y/value pairs and send this pairs to single map task. In my situation, I wa= nt only one key value pair should be generated for one line and it should b= e given to a single map task. Thats it?
=A0
For ex. Assume that this is my file <input.txt>
=A0
myFirstInput.vlc
mySecondInput.vlc
myThirdInput.vlc
=A0
Now, first map task should get a pair <1, myFirstInput.vlc>, the seco= nd gets a pair <2, mySecondInput.vlc> and so on.
=A0
Can someone throw some light in to this problem? For me, it looks straightf= orward but could not find any pointers in the web.
=A0
=A0
=A0
=A0
=A0
=A0
=A0
With thanks and regards
Balson
=A0
=A0
=A0




--
Regards,
Ted Xu
--047d7bd9202a0386c704dc43f3ae--