Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (athena.apache.org: local policy)
From: "Kartashov, Andy" <Andy.Kartashov@mpac.ca>
To: "user@hadoop.apache.org" <user@hadoop.apache.org>, "cdh-user@cloudera.org"
	<cdh-user@cloudera.org>
Subject: RE: Oozie apparent concurrency deadlocking
Thread-Topic: Oozie apparent concurrency deadlocking
Thread-Index: Ac3DP79WHwgoOYTmTJaXQnkJOFBLWgABb8xw
Date: Thu, 15 Nov 2012 15:29:02 +0000
Message-ID: <BD42F346AE90F544A731516A805D1B8AD869F1@SMAIL1.prd.mpac.ca>
Accept-Language: en-US
Content-Language: en-US
Content-Type: multipart/alternative;
	boundary="_000_BD42F346AE90F544A731516A805D1B8AD869F1SMAIL1prdmpacca_"
MIME-Version: 1.0


--_000_BD42F346AE90F544A731516A805D1B8AD869F1SMAIL1prdmpacca_
Content-Type: text/plain; charset="us-ascii"

Guys,

Further to my post below and while searching the web , I believe I found hints to my problem solution but have no idea how to implement it:

Qte
... - are you using FairScheduler? If so and since you mention that
the Sqoop import command is successful, you could be hitting your per
user job limit.

Whenever Oozie launches a job, it requires two job submissions (if not
more) - one being the monitor+launcher, and the subsequent ones being
the ones that do the real logic work. The launcher job is something
that will launch the remaining jobs, and hence sticks around until
they have all ended - taking up one running job slot for the whole
lifetime of the Oozie job.

For example, with a per user job limit of 3, if you were to run 3
Oozie jobs, the 3 slots would be filled with launchers first. These
would submit their real jobs next, and those would end up being in a
queue - thereby forming a resource deadlock.

The solution is to channel Oozie launcher hadoop jobs into a dedicated
launcher pool. This pool can have a running job limit too but won't
cause a deadlock because the pools are now separated.

To do this, you need to pass the config property:
"oozie.launcher.<property that specifies your pool>" via WF
<configuration> elements or <job-xml> files to point to the separate
pool.
Unqte

And also

Qte
Harsh J <harsh@cloudera.com> wrote:
>
>> In a FairScheduler environment, especially where max-running-job
>> limits are configured, it is recommended to override the Oozie
>> launcher job's pool to be different than the actual required working
>> pool (for actions that launch other MR jobs).
>>
>> If your scheduler is configured to pick ${user.name} up automatically,
>> then your Oozie launcher config must use the super-override pool name
>> config:
>>
>> oozie.launcher.mapred.fairscheduler.pool=launcherpoolname
>>
>> Your target pool for launchers can still carry limitations, but it
>> should no longer deadlock your actual MR execution (after which the
>> launcher dies away anyway).

Unqte

Please help.

Thanks,
Ak-47


From: Kartashov, Andy
Sent: Thursday, November 15, 2012 9:45 AM
To: user@hadoop.apache.org; 'cdh-user@cloudera.org'
Subject: Oozie apparent concurrency deadlocking

Guys,

Have struggled for the last four days with this and still cannot find an answer even after hours of searching the web.

I tried oozie workflow to execute my consecutive sqoop jobs in parallel.  I use forking that executes 9 sqoop-action-nodes.

I had no problem executing the job on a pseudo-distributed cluster but with an added DN/TT node I ran into (what seems like) deadlocking.  Oozie web interface displays those jobs as "Running" indefinitely until I eventually kill the workflow.

What I did noticed wasd that if I was to reduce the number of sqoop-action-nodes to 3, all works fine.

I found somewhere about oozie.service.CallableQueueService.callable.concurrency property to be set by default to 3 and it hinted me that this must be it them. I tried to over-ride this property by increasing this number to 5 in oozie-site.xml and restart oozie server and then run 4 sqoop-action-nodes in fork but the result is the same. 2 out of 4 nodes execute successfully (not in the same order every time) but the other 2 get hung in indefinite "Running...".

There were some suggestion about changing queue name from default but nothing was clear as to what it change it to and where.

In case someone found a solution to this please do share. I will greatly appreciate it.

Thanks,
AK47
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately. Please consider the environment before printing this e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? l'environnement avant d'imprimer le pr?sent courriel

--_000_BD42F346AE90F544A731516A805D1B8AD869F1SMAIL1prdmpacca_
Content-Type: text/html; charset="us-ascii"

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<style>
<!--
@font-face
	{font-family:Calibri}
@font-face
	{font-family:Tahoma}
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif"}
a:link, span.MsoHyperlink
	{color:blue;
	text-decoration:underline}
a:visited, span.MsoHyperlinkFollowed
	{color:purple;
	text-decoration:underline}
pre
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:10.0pt;
	font-family:"Courier New"}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:8.0pt;
	font-family:"Tahoma","sans-serif"}
span.BalloonTextChar
	{font-family:"Tahoma","sans-serif"}
p.msochpdefault, li.msochpdefault, div.msochpdefault
	{margin-right:0in;
	margin-left:0in;
	font-size:10.0pt;
	font-family:"Times New Roman","serif"}
span.emailstyle17
	{font-family:"Calibri","sans-serif";
	color:#1F497D}
span.balloontextchar0
	{font-family:"Tahoma","sans-serif"}
span.EmailStyle22
	{font-family:"Calibri","sans-serif";
	color:#1F497D}
span.EmailStyle23
	{font-family:"Calibri","sans-serif";
	color:#1F497D}
span.HTMLPreformattedChar
	{font-family:"Courier New"}
.MsoChpDefault
	{font-size:10.0pt}
@page WordSection1
	{margin:1.0in 1.0in 1.0in 1.0in}
div.WordSection1
	{}
-->
</style>
</head>
<body lang="EN-CA" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">Guys,</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">Further to my post below and while searching the web , I believe I found hints to my problem solution but have no idea how to implement it:</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">Qte</span></p>
<p class="MsoNormal"><span style="font-size:10.0pt; font-family:&quot;Arial&quot;,&quot;sans-serif&quot;; color:black; background:white">&#8230; - are you using FairScheduler? If so and since you mention that</span><span style="font-size:10.0pt; font-family:&quot;Arial&quot;,&quot;sans-serif&quot;; color:black"><br>
<span style="background:white">the Sqoop import command is successful, you could be hitting your per</span><br>
<span style="background:white">user job limit.</span><br>
<br>
<span style="background:white">Whenever Oozie launches a job, it requires two job submissions (if not</span><br>
<span style="background:white">more) - one being the monitor&#43;launcher, and the subsequent ones being</span><br>
<span style="background:white">the ones that do the real logic work. The launcher job is something</span><br>
<span style="background:white">that will launch the remaining jobs, and hence sticks around until</span><br>
<span style="background:white">they have all ended - taking up one running job slot for the whole</span><br>
<span style="background:white">lifetime of the Oozie job.</span><br>
<br>
<span style="background:white">For example, with a per user job limit of 3, if you were to run 3</span><br>
<span style="background:white">Oozie jobs, the 3 slots would be filled with launchers first. These</span><br>
<span style="background:white">would submit their real jobs next, and those would end up being in a</span><br>
<span style="background:white">queue - thereby forming a resource deadlock.</span><br>
<br>
<span style="background:white">The solution is to channel Oozie launcher hadoop jobs into a dedicated</span><br>
<span style="background:white">launcher pool. This pool can have a running job limit too but won't</span><br>
<span style="background:white">cause a deadlock because the pools are now separated.</span><br>
<br>
<span style="background:white">To do this, you need to pass the config property:</span><br>
<span style="background:white">&quot;oozie.launcher.&lt;property that specifies your pool&gt;&quot; via WF</span><br>
<span style="background:white">&lt;configuration&gt; elements or &lt;job-xml&gt; files to point to the separate</span><br>
<span style="background:white">pool.</span></span><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D"></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">Unqte</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">And also</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">Qte</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">Harsh J &lt;harsh@cloudera.com&gt; wrote:</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&nbsp;</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt; In a FairScheduler environment, especially where max-running-job</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt; limits are configured, it is recommended to override the Oozie</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt; launcher job's pool to be different than the actual required working</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt; pool (for actions that launch other MR jobs).</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt;&nbsp;</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt; If your scheduler is configured to pick ${user.name} up automatically,</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt; then your Oozie launcher config must use the super-override pool name</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt; config:</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt;&nbsp;</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt; oozie.launcher.mapred.fairscheduler.pool=launcherpoolname</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt;&nbsp;</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt; Your target pool for launchers can still carry limitations, but it</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt; should no longer deadlock your actual MR execution (after which the</span></p>
<p class="MsoNormal" style="background:white"><span style="font-size:10.0pt; font-family:&quot;Courier New&quot;; color:black">&gt;&gt; launcher dies away anyway).</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">Unqte</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">Please help.</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">Thanks,</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">Ak-47</span></p>
<p class="MsoNormal"><b><span style="font-size:8.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></b></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<div>
<div style="border:none; border-top:solid #B5C4DF 1.0pt; padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt; font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span lang="EN-US" style="font-size:10.0pt; font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Kartashov, Andy
<br>
<b>Sent:</b> Thursday, November 15, 2012 9:45 AM<br>
<b>To:</b> user@hadoop.apache.org; 'cdh-user@cloudera.org'<br>
<b>Subject:</b> Oozie apparent concurrency deadlocking</span></p>
</div>
</div>
<p class="MsoNormal">&nbsp;</p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">Guys,</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">Have struggled for the last four days with this and still cannot find an answer even after hours of searching the web.</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">I tried oozie workflow to execute my consecutive sqoop jobs in parallel. &nbsp;I use forking that executes 9 sqoop-action-nodes.</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">I had no problem executing the job on a pseudo-distributed cluster but with an added DN/TT node I ran into (what seems like) deadlocking.&nbsp; Oozie web interface
 displays those jobs as &#8220;Running&#8221; indefinitely until I eventually kill the workflow.</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">What I did noticed wasd that if I was to reduce the number of sqoop-action-nodes to 3, all works fine.
</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">I found somewhere about oozie.service.CallableQueueService.callable.concurrency property to be set by default to 3 and it hinted me that this must be it them.
 I tried to over-ride this property by increasing this number to 5 in oozie-site.xml and restart oozie server and then run 4 sqoop-action-nodes in fork but the result is the same. 2 out of 4 nodes execute successfully (not in the same order every time) but
 the other 2 get hung in indefinite &#8220;Running&#8230;&#8221;. &nbsp;&nbsp;</span>&nbsp;<span style="color:#1F497D"></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">There were some suggestion about changing queue name from default but nothing was clear as to what it change it to and where.</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">In case someone found a solution to this please do share. I will greatly appreciate it.</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">&nbsp;</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">Thanks,</span></p>
<p class="MsoNormal"><span style="font-size:11.0pt; font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;; color:#1F497D">AK47</span></p>
</div>
NOTICE: This e-mail message and any attachments are confidential, subject to copyright and may be privileged. Any unauthorized use, copying or disclosure is prohibited. If you are not the intended recipient, please delete and contact the sender immediately.
 Please consider the environment before printing this e-mail. AVIS : le pr&eacute;sent courriel et toute pi&egrave;ce jointe qui l'accompagne sont confidentiels, prot&eacute;g&eacute;s par le droit d'auteur et peuvent &ecirc;tre couverts par le secret professionnel. Toute utilisation, copie
 ou divulgation non autoris&eacute;e est interdite. Si vous n'&ecirc;tes pas le destinataire pr&eacute;vu de ce courriel, supprimez-le et contactez imm&eacute;diatement l'exp&eacute;diteur. Veuillez penser &agrave; l'environnement avant d'imprimer le pr&eacute;sent courriel
</body>
</html>

--_000_BD42F346AE90F544A731516A805D1B8AD869F1SMAIL1prdmpacca_--