Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: error (nike.apache.org: local policy)
MIME-Version: 1.0
In-Reply-To: 
 <CALEC75K4UizdcBsPRv5Hqq-OUEEKM015XZKCedeUaC3pN-sJOA@mail.gmail.com>
References: 
 <CALEC75Kufd4kFAxn9Z=ycgKcR1jaXnF9KY0vWdA7XKPMGT8_vg@mail.gmail.com>
	<CALEC75K4UizdcBsPRv5Hqq-OUEEKM015XZKCedeUaC3pN-sJOA@mail.gmail.com>
Date: Thu, 9 May 2013 15:25:04 +0800
Message-ID: 
 <CAP9+16zXhHhBxBzKSTg5gx6n0qNbo4OeFvv3SO39Vu68XUsBGw@mail.gmail.com>
Subject: Re: one new bie question
From: Ted Xu <txu@gopivotal.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=047d7bd9202a0386c704dc43f3ae

--047d7bd9202a0386c704dc43f3ae
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Hi Balson,

Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/o=
rg/apache/hadoop/mapred/lib/NLineInputFormat.html>?
You can find example of NLineInputFormat here: http://goo.gl/aVzDr.


On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A.
<balachandar.ra@gmail.com>wrote:

>
> Hello
>
> I would like to see the possibility of using map reduce framework for my
> following problem.
>
> I have a set of huge files. I would like to execute a binary over every
> input files. The binary needs to operate over the whole file and hence it
> is not possible to split the file in chunks. Let=92s assume that I have s=
ix
> such files and have their names in a single text file. I need to write
> hadoop code to take this single file as input and every line in it should
> go to one map task. The map tasks shall execute the binary on this file a=
nd
> the file can be located in hdfs. No reduce tasks is needed and no output
> shall be emitted from the map tasks as well. The binary take care of
> creating output file in the specified location.
> Is there a way to tell hadoop to feed single line to a map task? I came
> across few examples wherein a set of files has been given and looks like
> the framework try to split the file, reads every line in the split,
> generates key/value pairs and send this pairs to single map task. In my
> situation, I want only one key value pair should be generated for one lin=
e
> and it should be given to a single map task. Thats it?
>
> For ex. Assume that this is my file <input.txt>
>
> myFirstInput.vlc
> mySecondInput.vlc
> myThirdInput.vlc
>
> Now, first map task should get a pair <1, myFirstInput.vlc>, the second
> gets a pair <2, mySecondInput.vlc> and so on.
>
> Can someone throw some light in to this problem? For me, it looks
> straightforward but could not find any pointers in the web.
>
>
>
>
>
>
>
> With thanks and regards
> Balson
>
>
>
>


--=20
Regards,
Ted Xu

--047d7bd9202a0386c704dc43f3ae
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Balson,<div><br></div><div style>Have you tried=A0<a hr=
ef=3D"http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/li=
b/NLineInputFormat.html">NLineInputFormat</a>? You can find example of NLin=
eInputFormat here:=A0<a href=3D"http://goo.gl/aVzDr">http://goo.gl/aVzDr</a=
>.</div>
</div><div class=3D"gmail_extra"><br><br><div class=3D"gmail_quote">On Thu,=
 May 9, 2013 at 2:53 PM, Balachandar R.A. <span dir=3D"ltr">&lt;<a href=3D"=
mailto:balachandar.ra@gmail.com" target=3D"_blank">balachandar.ra@gmail.com=
</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex"><p><br>
Hello</p>
<p>I would like to see the possibility of using map reduce framework for my=
 following problem.<br>
=A0<br>
I have a set of huge files. I would like to execute a binary over every inp=
ut files. The binary needs to operate over the whole file and hence it is n=
ot possible to split the file in chunks. Let=92s assume that I have six suc=
h files and have their names in a single text file. I need to write hadoop =
code to take this single file as input and every line in it should go to on=
e map task. The map tasks shall execute the binary on this file and the fil=
e can be located in hdfs. No reduce tasks is needed and no output shall be =
emitted from the map tasks as well. The binary take care of creating output=
 file in the specified location.<br>


Is there a way to tell hadoop to feed single line to a map task? I came acr=
oss few examples wherein a set of files has been given and looks like the f=
ramework try to split the file, reads every line in the split, generates ke=
y/value pairs and send this pairs to single map task. In my situation, I wa=
nt only one key value pair should be generated for one line and it should b=
e given to a single map task. Thats it?<br>


=A0<br>
For ex. Assume that this is my file &lt;input.txt&gt;<br>
=A0<br>
myFirstInput.vlc<br>
mySecondInput.vlc<br>
myThirdInput.vlc<br>
=A0<br>
Now, first map task should get a pair &lt;1, myFirstInput.vlc&gt;, the seco=
nd gets a pair &lt;2, mySecondInput.vlc&gt; and so on.<br>
=A0<br>
Can someone throw some light in to this problem? For me, it looks straightf=
orward but could not find any pointers in the web.<br>
=A0<br>
=A0<br>
=A0<br>
=A0<br>
=A0<br>
=A0<br>
=A0<br>
With thanks and regards<br>
Balson<br>
=A0<br>
=A0<br>
=A0</p>
</blockquote></div><br><br clear=3D"all"><div><br></div>-- <br><div dir=3D"=
ltr">Regards,<div><div>Ted Xu</div></div></div>
</div>

--047d7bd9202a0386c704dc43f3ae--