Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of hadoop@gmx.com designates
 213.165.64.42 as permitted sender)
Content-Type: multipart/alternative;
 boundary="========GMXBoundary27681333543628550753"
Date: Wed, 04 Apr 2012 14:47:08 +0200
From: jagatsingh@gmail.com
Message-ID: <20120404124708.27680@gmx.com>
MIME-Version: 1.0
Subject: RE: Calling one MR job within another MR job
To: mapreduce-user@hadoop.apache.org

--========GMXBoundary27681333543628550753
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: 8bit

Hello Stuti

 The way you have explained it seems we can think about caching the file2 already in nodes.

 -- Just out of context , In the same way replicated joins are being handled in Pig in which one file (file2) to be joined is cached in the memory by file1.

 Regards

 Jagat

----- Original Message -----
From: Stuti Awasthi
Sent: 04/04/12 07:55 AM
To: mapreduce-user@hadoop.apache.org
Subject: RE: Calling one MR job within another MR job

Hi Ravi,
There is no job dependency so I cannot use chaining MR or JobControl as you suggested.
I have 2 relatively big files, I start processing with File1 as input to MR1 job , now this processing required to find the data from File2. One way to do is loop through File2 and get the data. Other way to pass File2 in MR2 job for parallel processing.
Second option is making hinting me to call an MR2 job inside from MR1 job. I am sure this is the common problem that people usually face. What is the best way to resolve this kind of issue.
Thanks

From: Ravi teja ch n v [mailto:raviteja.chnv@huawei.com]
 *Sent:* Wednesday, April 04, 2012 4:35 PM
 *To:* mapreduce-user@hadoop.apache.org
 *Subject:* RE: Calling one MR job within another MR job

Hi Stuti,
If you are looking for MRjob2 to run after MRjob1, ie the job dependency, 
you can use JobControl API, where you can manage the dependencies.
Calling another Job from a Mapper is not a good idea.
Thanks,
Ravi Teja

-----------------------------------------------------------------

From: Stuti Awasthi [stutiawasthi@hcl.com]
 *Sent:* 04 April 2012 16:04:19
 *To:* mapreduce-user@hadoop.apache.org
 *Subject:* Calling one MR job within another MR job

Hi all,
We have a usecase in which I start with first MR1 job with input file as File1.txt, and from this job, call another MR2 job with input as File2.txt
So :
MRjob1{
Map(){
MRJob2(File2.txt)
}
}
MRJob2{
Processing….
}
My queries are is this kind of approach is possible and how much are the implications from the performance perspective.
Regards,
Stuti Awasthi
HCL Comnet Systems and Services Ltd
F-8/9 Basement, Sec-3,Noida.

-----------------------------------------------------------------
::DISCLAIMER::
 -----------------------------------------------------------------------------------------------------------------------

 The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only.
 It shall not attach any liability on the originator or HCL or its affiliates. Any views or opinions presented in
 this email are solely those of the author and may not necessarily reflect the opinions of HCL or its affiliates.
 Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of
 this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have
 received this email in error please delete it and notify the sender immediately. Before opening any mail and
 attachments please check them for viruses and defect.

 -----------------------------------------------------------------------------------------------------------------------

--========GMXBoundary27681333543628550753
Content-Type: text/html; charset="utf-8"
Content-Transfer-Encoding: quoted-printable

<span style=3D'font-family:Verdana'><span style=3D'font-size:12px'>Hello St=
uti<br />=20
<br />=20
The way you have explained it seems we can think about caching the file2 al=
ready in nodes.<br />=20
<br />=20
-- Just out of context , In the same way replicated joins are being handled=
 in Pig in which one file (file2) to be joined is cached in the memory by f=
ile1.<br />=20
<br />=20
Regards<br />=20
<br />=20
Jagat<br />=20
<br />=20
<p style=3D"margin:0px; padding:0px;" >=20
	=C2=A0</p>=20
<blockquote style=3D"border-left: 1px solid #CCC; padding-left: 5px; margin=
-left: 5px; margin-bottom: 0px; margin-top: 0px; margin-right: 0px;" type=
=3D"cite">=20
	<p style=3D"margin:0px; padding:0px;" >=20
		<span style=3D"font-family:Verdana"><span style=3D"font-size:12px">----- =
Original Message -----</span></span></p>=20
	<p style=3D"margin:0px; padding:0px;" >=20
		<span style=3D"font-family:Verdana"><span style=3D"font-size:12px">From: =
Stuti Awasthi</span></span></p>=20
	<p style=3D"margin:0px; padding:0px;" >=20
		<span style=3D"font-family:Verdana"><span style=3D"font-size:12px">Sent: =
04/04/12 07:55 AM</span></span></p>=20
	<p style=3D"margin:0px; padding:0px;" >=20
		<span style=3D"font-family:Verdana"><span style=3D"font-size:12px">To: ma=
preduce-user@hadoop.apache.org</span></span></p>=20
	<p style=3D"margin:0px; padding:0px;" >=20
		<span style=3D"font-family:Verdana"><span style=3D"font-size:12px">Subjec=
t: RE: Calling one MR job within another MR job</span></span></p>=20
	<br />=20
	<div link=3D"blue" vlink=3D"purple">=20
		<div class=3D"WordSection1">=20
			<p class=3D"MsoNormal">=20
				<span style=3D"color:#1F497D">Hi Ravi,</span></p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p class=3D"MsoNormal">=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p class=3D"MsoNormal">=20
				<span style=3D"color:#1F497D">There is no job dependency so I cannot us=
e chaining MR or JobControl as you suggested.</span></p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p class=3D"MsoNormal">=20
				<span style=3D"color:#1F497D">I have 2 relatively big files, I start pr=
ocessing with File1 as input to MR1 job , now this processing required to f=
ind the data from File2. One way to do is loop through File2 and get the da=
ta. Other way to pass File2 in MR2 job for parallel processing.</span></p>=
=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p class=3D"MsoNormal">=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p class=3D"MsoNormal">=20
				<span style=3D"color:#1F497D">Second option is making hinting me to cal=
l an MR2 job inside from MR1 job. I am sure this is the common problem that=
 people usually face. What is the best way to resolve this &nbsp;kind of is=
sue.</span></p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p class=3D"MsoNormal">=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p class=3D"MsoNormal">=20
				<span style=3D"color:#1F497D">Thanks</span></p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p class=3D"MsoNormal">=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<div>=20
				<div style=3D"border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt =
0in 0in 0in">=20
					<p class=3D"MsoNormal">=20
						<b><span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&qu=
ot;sans-serif&quot;">From:</span></b><span style=3D"font-size:10.0pt;font-f=
amily:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Ravi teja ch n v [mailto:=
raviteja.chnv@huawei.com]<br />=20
						<b>Sent:</b> Wednesday, April 04, 2012 4:35 PM<br />=20
						<b>To:</b> mapreduce-user@hadoop.apache.org<br />=20
						<b>Subject:</b> RE: Calling one MR job within another MR job</span></=
p>=20
					<p style=3D"margin:0px; padding:0px;" >=20
						=C2=A0</p>=20
					<p style=3D"margin:0px; padding:0px;" >=20
						=C2=A0</p>=20
				</div>=20
			</div>=20
			<p class=3D"MsoNormal">=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<p style=3D"margin:0px; padding:0px;" >=20
				=C2=A0</p>=20
			<div>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					<span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;s=
ans-serif&quot;;color:black">Hi Stuti,</span></p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					<span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;s=
ans-serif&quot;;color:black">&nbsp;</span></p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					<span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;s=
ans-serif&quot;;color:black">If you are looking for MRjob2 to run after MRj=
ob1, ie the job dependency, </span></p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					<span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;s=
ans-serif&quot;;color:black">you can use JobControl API, where you can mana=
ge the dependencies.</span></p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					<span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;s=
ans-serif&quot;;color:black">&nbsp;</span></p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					<span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;s=
ans-serif&quot;;color:black">Calling another Job from a Mapper is not a goo=
d idea.</span></p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					<span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;s=
ans-serif&quot;;color:black">&nbsp;</span></p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					<span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;s=
ans-serif&quot;;color:black">Thanks,</span></p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					<span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;s=
ans-serif&quot;;color:black">Ravi Teja</span></p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					<span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;s=
ans-serif&quot;;color:black">&nbsp;</span></p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<p style=3D"margin:0px; padding:0px;" >=20
					=C2=A0</p>=20
				<div>=20
					<div align=3D"center" class=3D"MsoNormal" style=3D"text-align:center">=
=20
						<hr align=3D"center" size=3D"3" width=3D"100%" />=20
					</div>=20
					<div id=3D"divRpF448784">=20
						<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">=20
							<b><span style=3D"font-size:10.0pt;font-family:&quot;Tahoma&quot;,&q=
uot;sans-serif&quot;;color:black">From:</span></b><span style=3D"font-size:=
10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;;color:black"> =
Stuti Awasthi [stutiawasthi@hcl.com]<br />=20
							<b>Sent:</b> 04 April 2012 16:04:19<br />=20
							<b>To:</b> mapreduce-user@hadoop.apache.org<br />=20
							<b>Subject:</b> Calling one MR job within another MR job</span></p>=
=20
						<p style=3D"margin:0px; padding:0px;" >=20
							=C2=A0</p>=20
						<p style=3D"margin:0px; padding:0px;" >=20
							=C2=A0</p>=20
					</div>=20
					<div>=20
						<div>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">Hi all,</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">&nbsp;</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">We have a usecase in which I start with=
 first MR1 job with input file as File1.txt, and from this job, call anothe=
r MR2 job with input as File2.txt</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">So :</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">MRjob1{</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">Map(){</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">MRJob2(File2.txt)</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">}</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">}</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">&nbsp;</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">MRJob2{</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">Processing=E2=80=A6.</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">}</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">&nbsp;</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">My queries are is this kind of approach=
 is possible and how much are the implications from the performance perspec=
tive.</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">&nbsp;</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">&nbsp;</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"font-size:10.0pt;color:#31849B">Regards,</span></p>=
=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<b><span style=3D"font-size:10.0pt;color:#31849B">Stuti Awasthi</sp=
an></b></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"font-size:10.0pt;color:#31849B">HCL Comnet Systems a=
nd Services Ltd</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"font-size:10.0pt;color:#31849B">F-8/9 Basement, Sec-=
3,Noida.</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p class=3D"MsoNormal">=20
								<span style=3D"color:black">&nbsp;</span></p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
							<p style=3D"margin:0px; padding:0px;" >=20
								=C2=A0</p>=20
						</div>=20
						<p class=3D"MsoNormal">=20
							=C2=A0</p>=20
						<p style=3D"margin:0px; padding:0px;" >=20
							=C2=A0</p>=20
						<p style=3D"margin:0px; padding:0px;" >=20
							=C2=A0</p>=20
						<div align=3D"center" class=3D"MsoNormal" style=3D"text-align:center"=
>=20
							<hr align=3D"center" size=3D"3" width=3D"100%" />=20
						</div>=20
						<p class=3D"MsoNormal">=20
							<span style=3D"font-size:7.5pt;font-family:&quot;Arial&quot;,&quot;s=
ans-serif&quot;;color:gray">::DISCLAIMER::<br />=20
							--------------------------------------------------------------------=
---------------------------------------------------<br />=20
							<br />=20
							The contents of this e-mail and any attachment(s) are confidential a=
nd intended for the named recipient(s) only.<br />=20
							It shall not attach any liability on the originator or HCL or its af=
filiates. Any views or opinions presented in<br />=20
							this email are solely those of the author and may not necessarily re=
flect the opinions of HCL or its affiliates.<br />=20
							Any form of reproduction, dissemination, copying, disclosure, modifi=
cation, distribution and / or publication of<br />=20
							this message without the prior written consent of the author of this=
 e-mail is strictly prohibited. If you have<br />=20
							received this email in error please delete it and notify the sender =
immediately. Before opening any mail and<br />=20
							attachments please check them for viruses and defect.<br />=20
							<br />=20
							--------------------------------------------------------------------=
---------------------------------------------------</span></p>=20
						<p style=3D"margin:0px; padding:0px;" >=20
							=C2=A0</p>=20
						<p style=3D"margin:0px; padding:0px;" >=20
							=C2=A0</p>=20
					</div>=20
				</div>=20
			</div>=20
		</div>=20
	</div>=20
</blockquote>=20
<p style=3D"margin:0px; padding:0px;" >=20
	=C2=A0</p>=20
<br />=20
<br />=20
<br />=20
<br />=20
<span id=3D"editor_signature"><span style=3D"font-family:Verdana; font-size=
:12px">&nbsp;</span></span></span></span>

--========GMXBoundary27681333543628550753--