Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of michael@cloudera.com designates
 209.85.223.176 as permitted sender)
References: <022201cdb3f0$2d5fd2f0$881f78d0$@yahoo.com>
 <CAND0qzsj7Gc+DXRHDrPkz3EUngkDTjcVH57KB5V1Z+bdOk7iWA@mail.gmail.com>
 <034301cdb572$e836b110$b8a41330$@yahoo.com>
From: Michael Katzenellenbogen <michael@cloudera.com>
In-Reply-To: <034301cdb572$e836b110$b8a41330$@yahoo.com>
Mime-Version: 1.0 (1.0)
Date: Sun, 28 Oct 2012 21:33:34 -0400
Message-ID: <1821249225262378034@unknownmsgid>
Subject: Re: Cluster wide atomic operations
To: "user@hadoop.apache.org" <user@hadoop.apache.org>
Content-Type: multipart/alternative; boundary=e89a8f3b9b5fc7617d04cd28a818

--e89a8f3b9b5fc7617d04cd28a818
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Twitter's Snowflake may provide you with some inspiration:

https://github.com/twitter/snowflake

-Michael

On Oct 28, 2012, at 9:16 PM, David Parks <davidparks21@yahoo.com> wrote:

I need a unique & permanent ID assigned to new item encountered, which has
a constraint that it is in the range of, let=92s say for simple discussion,
one to one million.


I suppose I could assign a range of usable IDs to each reduce task (where
ID=92s are assigned) and keep those organized somehow at the end of the job=
,
but this seems clunky too.


Since this is on AWS, zookeeper is not a good option. I thought it was part
of the hadoop cluster (and thus easy to access), but guess I was wrong
there.


I would think that such a service would run most logically on the
taskmaster server. I=92m surprised this isn=92t a common issue. I guess I c=
ould
launch a separate job that runs such a sequence service perhaps. But that=
=92s
non trivial its self with failure concerns.


Perhaps there=92s just a better way of thinking of this?


*From:* Ted Dunning [mailto:tdunning@maprtech.com <tdunning@maprtech.com>]
*Sent:* Saturday, October 27, 2012 12:23 PM
*To:* user@hadoop.apache.org
*Subject:* Re: Cluster wide atomic operations


This is better asked on the Zookeeper lists.


The first answer is that global atomic operations are a generally bad idea.


The second answer is that if you an batch these operations up then you can
cut the evilness of global atomicity by a substantial factor.


Are you sure you need a global counter?

On Fri, Oct 26, 2012 at 11:07 PM, David Parks <davidparks21@yahoo.com>
wrote:

How can we manage cluster-wide atomic operations? Such as maintaining an
auto-increment counter.

Does Hadoop provide native support for these kinds of operations?

An in case ultimate answer involves zookeeper, I'd love to work out doing
this in AWS/EMR.

--e89a8f3b9b5fc7617d04cd28a818
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; charset=
=3Dutf-8"></head><body dir=3D"auto"><div>Twitter&#39;s Snowflake may provid=
e you with some inspiration:</div><div><span style=3D"font-family:&#39;.Hel=
veticaNeueUI&#39;;font-size:15px;line-height:19px;white-space:nowrap"><br>
</span></div><div><span style=3D"font-family:&#39;.HelveticaNeueUI&#39;;fon=
t-size:15px;line-height:19px;white-space:nowrap"><a href=3D"https://github.=
com/twitter/snowflake">https://github.com/twitter/snowflake</a></span></div=
>
<div><br>-Michael</div><div><br>On Oct 28, 2012, at 9:16 PM, David Parks &l=
t;<a href=3D"mailto:davidparks21@yahoo.com">davidparks21@yahoo.com</a>&gt; =
wrote:<br><br></div><blockquote type=3D"cite"><div><meta http-equiv=3D"Cont=
ent-Type" content=3D"text/html; charset=3Dus-ascii"><meta name=3D"Generator=
" content=3D"Microsoft Word 14 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
	{font-family:Tahoma;
	panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	margin-bottom:.0001pt;
	font-size:12.0pt;
	font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:blue;
	text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
	{mso-style-priority:99;
	color:purple;
	text-decoration:underline;}
span.EmailStyle17
	{mso-style-type:personal-reply;
	font-family:"Calibri","sans-serif";
	color:#1F497D;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri","sans-serif";}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
--></style><div class=3D"WordSection1"><p class=3D"MsoNormal"><span style=
=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;,&quot;sans-serif&quot;=
;color:#1f497d">I need a unique &amp; permanent ID assigned to new item enc=
ountered, which has a constraint that it is in the range of, let=92s say fo=
r simple discussion, one to one million.</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p><p class=3D=
"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;=
,&quot;sans-serif&quot;;color:#1f497d">I suppose I could assign a range of =
usable IDs to each reduce task (where ID=92s are assigned) and keep those o=
rganized somehow at the end of the job, but this seems clunky too.</span></=
p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p><p class=3D=
"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;=
,&quot;sans-serif&quot;;color:#1f497d">Since this is on AWS, zookeeper is n=
ot a good option. I thought it was part of the hadoop cluster (and thus eas=
y to access), but guess I was wrong there.</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p><p class=3D=
"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;=
,&quot;sans-serif&quot;;color:#1f497d">I would think that such a service wo=
uld run most logically on the taskmaster server. I=92m surprised this isn=
=92t a common issue. I guess I could launch a separate job that runs such a=
 sequence service perhaps. But that=92s non trivial its self with failure c=
oncerns. </span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p><p class=3D=
"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;=
,&quot;sans-serif&quot;;color:#1f497d">Perhaps there=92s just a better way =
of thinking of this?</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Ca=
libri&quot;,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p><p class=3D=
"MsoNormal"><span style=3D"font-size:11.0pt;font-family:&quot;Calibri&quot;=
,&quot;sans-serif&quot;;color:#1f497d">=A0</span></p>
<p class=3D"MsoNormal"><b><span style=3D"font-size:10.0pt;font-family:&quot=
;Tahoma&quot;,&quot;sans-serif&quot;">From:</span></b><span style=3D"font-s=
ize:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"> Ted Dunn=
ing [<a href=3D"mailto:tdunning@maprtech.com">mailto:tdunning@maprtech.com<=
/a>] <br>
<b>Sent:</b> Saturday, October 27, 2012 12:23 PM<br><b>To:</b> <a href=3D"m=
ailto:user@hadoop.apache.org">user@hadoop.apache.org</a><br><b>Subject:</b>=
 Re: Cluster wide atomic operations</span></p><p class=3D"MsoNormal">=A0</p=
><p class=3D"MsoNormal">
This is better asked on the Zookeeper lists.</p><div><p class=3D"MsoNormal"=
>=A0</p></div><div><p class=3D"MsoNormal">The first answer is that global a=
tomic operations are a generally bad idea.</p></div><div><p class=3D"MsoNor=
mal">
=A0</p></div><div><p class=3D"MsoNormal">The second answer is that if you a=
n batch these operations up then you can cut the evilness of global atomici=
ty by a substantial factor.</p></div><div><p class=3D"MsoNormal">=A0</p></d=
iv><div>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">Are you sure you need=
 a global counter?</p><div><p class=3D"MsoNormal">On Fri, Oct 26, 2012 at 1=
1:07 PM, David Parks &lt;<a href=3D"mailto:davidparks21@yahoo.com" target=
=3D"_blank">davidparks21@yahoo.com</a>&gt; wrote:</p>
<p class=3D"MsoNormal" style=3D"margin-bottom:12.0pt">How can we manage clu=
ster-wide atomic operations? Such as maintaining an<br>auto-increment count=
er.<br><br>Does Hadoop provide native support for these kinds of operations=
?<br>
<br>An in case ultimate answer involves zookeeper, I&#39;d love to work out=
 doing<br>this in AWS/EMR.</p></div><p class=3D"MsoNormal">=A0</p></div></d=
iv></div></blockquote></body></html>

--e89a8f3b9b5fc7617d04cd28a818--