Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of shahab.yunus@gmail.com
 designates 209.85.215.50 as permitted sender)
MIME-Version: 1.0
In-Reply-To: 
 <CACp0qUFieZ8-H_GJX6_myXDjP0EijrsF0QKKoeztiKvfdrNFKg@mail.gmail.com>
References: 
 <CAPhMYXKTLu3BgLpKGgTawy+R8UyXE28xBnfQDjxY1W56H=zYuA@mail.gmail.com>
	<CAOHP_tGUP5ODsx+_m5=35gganRLeUqagyMsaiutQJ-L53NbcEg@mail.gmail.com>
	<CALte62ykqR+171n-3tvACo25K9Dj5aDo8RKXyLtzh5JUUmi1HQ@mail.gmail.com>
	<CACp0qUFieZ8-H_GJX6_myXDjP0EijrsF0QKKoeztiKvfdrNFKg@mail.gmail.com>
Date: Tue, 6 Jan 2015 08:43:51 -0500
Message-ID: 
 <CAEo-6+Q_aU0w0+ifWNaS3AF+pgvvA-WET1dPg8zF+c43R=CMwQ@mail.gmail.com>
Subject: Re: Write and Read file through map reduce
From: Shahab Yunus <shahab.yunus@gmail.com>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=001a1133a74e532c6f050bfbff84

--001a1133a74e532c6f050bfbff84
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Distributed Cache has been deprecated for a while. You can use the new
mechanism, which is functionally the same thing, discussed here in this
thread:
http://stackoverflow.com/questions/21239722/hadoop-distributedcache-is-depr=
ecated-what-is-the-preferred-api

Regards,
Shahab

On Mon, Jan 5, 2015 at 10:57 PM, unmesha sreeveni <unmeshabiju@gmail.com>
wrote:

> Hi hitarth
> =E2=80=8B,
>
> If your file1 and file 2 is smaller you can move on with Distributed Cach=
e.
> mentioned here
> <http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distribut=
edcache-in.html>
>  .
>
> Or you can move on with MultipleInputFormat
> =E2=80=8B mentioned here
> <http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multi=
pleinput.html>=E2=80=8B
> .
>
> [1]
> http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-distribute=
dcache-in.html
> [2]
> http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multip=
leinput.html
>
> On Tue, Jan 6, 2015 at 8:53 AM, Ted Yu <yuzhihong@gmail.com> wrote:
>
>> Hitarth:
>> You can also consider MultiFileInputFormat (and its concrete
>> implementations).
>>
>> Cheers
>>
>> On Mon, Jan 5, 2015 at 6:14 PM, Corey Nolet <cjnolet@gmail.com> wrote:
>>
>>> Hitarth,
>>>
>>> I don't know how much direction you are looking for with regards to the
>>> formats of the times but you can certainly read both files into the thi=
rd
>>> mapreduce job using the FileInputFormat by comma-separating the paths t=
o
>>> the files. The blocks for both files will essentially be unioned togeth=
er
>>> and the mappers scheduled across your cluster.
>>>
>>> On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <t.hitarth@gmail.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have 6 node cluster, and the scenario is as follows :-
>>>>
>>>> I have one map reduce job which will write file1 in HDFS.
>>>> I have another map reduce job which will write file2 in  HDFS.
>>>> In the third map reduce job I need to use file1 and file2 to do some
>>>> computation and output the value.
>>>>
>>>> What is the best way to store file1 and file2 in HDFS so that they
>>>> could be used in third map reduce job.
>>>>
>>>> Thanks,
>>>> Hitarth
>>>>
>>>
>>>
>>
>
>
> --
> *Thanks & Regards *
>
>
> *Unmesha Sreeveni U.B*
> *Hadoop, Bigdata Developer*
> *Centre for Cyber Security | Amrita Vishwa Vidyapeetham*
> http://www.unmeshasreeveni.blogspot.in/
>
>
>

--001a1133a74e532c6f050bfbff84
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Distributed Cache has been deprecated for a while. You can=
 use the new mechanism, which is functionally the same thing, discussed her=
e in this thread:<div><a href=3D"http://stackoverflow.com/questions/2123972=
2/hadoop-distributedcache-is-deprecated-what-is-the-preferred-api">http://s=
tackoverflow.com/questions/21239722/hadoop-distributedcache-is-deprecated-w=
hat-is-the-preferred-api</a><br></div><div><br></div><div>Regards,</div><di=
v>Shahab</div></div><div class=3D"gmail_extra"><br><div class=3D"gmail_quot=
e">On Mon, Jan 5, 2015 at 10:57 PM, unmesha sreeveni <span dir=3D"ltr">&lt;=
<a href=3D"mailto:unmeshabiju@gmail.com" target=3D"_blank">unmeshabiju@gmai=
l.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" style=3D"m=
argin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir=3D"l=
tr">Hi hitarth<div class=3D"gmail_default" style=3D"font-family:verdana,san=
s-serif;display:inline">=E2=80=8B,</div><div><div class=3D"gmail_default" s=
tyle=3D"font-family:verdana,sans-serif;display:inline"><br></div></div><div=
><div class=3D"gmail_default" style=3D"font-family:verdana,sans-serif;displ=
ay:inline">If your file1 and file 2 is smaller you can move on with Distrib=
uted Cache.</div></div><div><div class=3D"gmail_default" style=3D"display:i=
nline"><font face=3D"verdana, sans-serif">mentioned <a href=3D"http://unmes=
hasreeveni.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html=
" target=3D"_blank">here</a>=C2=A0.</font><br></div></div><div><br></div>Or=
 you can move on with MultipleInputFormat<div class=3D"gmail_default" style=
=3D"font-family:verdana,sans-serif;display:inline">=E2=80=8B mentioned <a h=
ref=3D"http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-m=
ultipleinput.html" target=3D"_blank">here</a>=E2=80=8B .</div><div><div cla=
ss=3D"gmail_default" style=3D"font-family:verdana,sans-serif;display:inline=
"><br></div></div><div><div class=3D"gmail_default" style=3D"font-family:ve=
rdana,sans-serif;display:inline">[1]=C2=A0<a href=3D"http://unmeshasreeveni=
.blogspot.in/2014/10/how-to-load-file-in-distributedcache-in.html" target=
=3D"_blank">http://unmeshasreeveni.blogspot.in/2014/10/how-to-load-file-in-=
distributedcache-in.html</a></div></div><div><div class=3D"gmail_default" s=
tyle=3D"font-family:verdana,sans-serif;display:inline">[2]=C2=A0<a href=3D"=
http://unmeshasreeveni.blogspot.in/2014/12/joining-two-files-using-multiple=
input.html" target=3D"_blank">http://unmeshasreeveni.blogspot.in/2014/12/jo=
ining-two-files-using-multipleinput.html</a></div></div><div class=3D"gmail=
_extra"><br><div class=3D"gmail_quote">On Tue, Jan 6, 2015 at 8:53 AM, Ted =
Yu <span dir=3D"ltr">&lt;<a href=3D"mailto:yuzhihong@gmail.com" target=3D"_=
blank">yuzhihong@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gm=
ail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-l=
eft-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><div d=
ir=3D"ltr"><div><span style=3D"font-size:12.7272720336914px">Hitarth:</span=
><br></div>You can also consider MultiFileInputFormat (and its concrete imp=
lementations).<div><br></div><div>Cheers</div></div><div><div><div class=3D=
"gmail_extra"><br><div class=3D"gmail_quote">On Mon, Jan 5, 2015 at 6:14 PM=
, Corey Nolet <span dir=3D"ltr">&lt;<a href=3D"mailto:cjnolet@gmail.com" ta=
rget=3D"_blank">cjnolet@gmail.com</a>&gt;</span> wrote:<br><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;b=
order-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"=
><div dir=3D"ltr">Hitarth,<div><br></div><div>I don&#39;t know how much dir=
ection you are looking for with regards to the formats of the times but you=
 can certainly read both files into the third mapreduce job using the FileI=
nputFormat by comma-separating the paths to the files. The blocks for both =
files will essentially be unioned together and the mappers scheduled across=
 your cluster.</div></div><div><div><div class=3D"gmail_extra"><br><div cla=
ss=3D"gmail_quote">On Mon, Jan 5, 2015 at 3:55 PM, hitarth trivedi <span di=
r=3D"ltr">&lt;<a href=3D"mailto:t.hitarth@gmail.com" target=3D"_blank">t.hi=
tarth@gmail.com</a>&gt;</span> wrote:<br><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:r=
gb(204,204,204);border-left-style:solid;padding-left:1ex"><div dir=3D"ltr">=
Hi,<div><br></div><div>I have 6 node cluster, and the scenario is as follow=
s :-</div><div><br></div><div>I have one map reduce job which will write fi=
le1 in HDFS.</div><div>I have another map reduce job which will write file2=
 in =C2=A0HDFS.</div><div>In the third map reduce job I need to use file1 a=
nd file2 to do some computation and output the value.</div><div><br></div><=
div>What is the best way to store file1 and file2 in HDFS so that they coul=
d be used in third map reduce job.</div><div><br></div><div>Thanks,</div><d=
iv>Hitarth</div></div>
</blockquote></div><br></div>
</div></div></blockquote></div><br></div><span class=3D"HOEnZb"><font color=
=3D"#888888">
</font></span></div></div></blockquote></div><span class=3D"HOEnZb"><font c=
olor=3D"#888888"><br><br clear=3D"all"><div><br></div>-- <br><div><div dir=
=3D"ltr"><div><div dir=3D"ltr"><b><font color=3D"#3d85c6"><i>Thanks &amp; R=
egards</i>
</font></b><div><i><b><font color=3D"#3d85c6"><br></font></b></i></div><div=
><b><font color=3D"#3d85c6">Unmesha Sreeveni U.B<i><br></i></font></b></div=
><div><b><font color=3D"#3d85c6">Hadoop, Bigdata Developer</font></b></div>=
<div><b><font color=3D"#3d85c6">Centre for Cyber Security | Amrita Vishwa V=
idyapeetham</font></b><br></div><div style=3D"color:rgb(102,0,0)"><a href=
=3D"http://www.unmeshasreeveni.blogspot.in/" target=3D"_blank">http://www.u=
nmeshasreeveni.blogspot.in/</a><br></div><div style=3D"color:rgb(102,0,0)">=
<br></div><i><span><br></span></i></div></div></div></div>
</font></span></div></div>
</blockquote></div><br></div>

--001a1133a74e532c6f050bfbff84--