Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 985F1D504 for ; Mon, 29 Oct 2012 01:34:12 +0000 (UTC) Received: (qmail 79098 invoked by uid 500); 29 Oct 2012 01:34:08 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 79016 invoked by uid 500); 29 Oct 2012 01:34:08 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 79008 invoked by uid 99); 29 Oct 2012 01:34:08 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Oct 2012 01:34:08 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of michael@cloudera.com designates 209.85.223.176 as permitted sender) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 29 Oct 2012 01:34:01 +0000 Received: by mail-ie0-f176.google.com with SMTP id k11so6881532iea.35 for ; Sun, 28 Oct 2012 18:33:41 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=references:from:in-reply-to:mime-version:date:message-id:subject:to :content-type:x-gm-message-state; bh=+QKhcxV/KZNaiRNGvGbcmISzaDHN2TA26ylK+DQjeRA=; b=Wfl/h2OA74Hjtpbm7KXSCx3HdNGPQu6/vr4CHDK6qX19K2Gta0OxZ5E/ER0svqyx6h XsxRNuK3TEsiYjtHHjLW1XKAOGwYyE4R54RrW8x02j0XE6zV+3uSJSs6v+nE6UJ3n2y1 +flWMxz1E0cRUmjYY7CPus2HBUvrdYWOUeTfRpR4a0cKRNS+lx9DET+ij6dr5cAcjxvT atYRaL7cwchAiTXQKn9ouJDe0xanloyJ4SD91sVU75jQRRfeQG21/O1UYzNWOiTJcAtP 86rcW1rvRWZsPVsD1JjRAC2Qyg0vFCKTpev+a6XWl0/zRzX3q67MVPyiF2tX6+qaYO9F dGvw== Received: by 10.50.156.232 with SMTP id wh8mr8105405igb.56.1351474420867; Sun, 28 Oct 2012 18:33:40 -0700 (PDT) References: <022201cdb3f0$2d5fd2f0$881f78d0$@yahoo.com> <034301cdb572$e836b110$b8a41330$@yahoo.com> From: Michael Katzenellenbogen In-Reply-To: <034301cdb572$e836b110$b8a41330$@yahoo.com> Mime-Version: 1.0 (1.0) Date: Sun, 28 Oct 2012 21:33:34 -0400 Message-ID: <1821249225262378034@unknownmsgid> Subject: Re: Cluster wide atomic operations To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=e89a8f3b9b5fc7617d04cd28a818 X-Gm-Message-State: ALoCoQk+8H4eaacaZliUjgPnigRPgaTw0jEF/TvO7cDfaW92v0X0+dwT48NU1g5Czl9SVTFMU9Kc X-Virus-Checked: Checked by ClamAV on apache.org --e89a8f3b9b5fc7617d04cd28a818 Content-Type: text/plain; charset=windows-1252 Content-Transfer-Encoding: quoted-printable Twitter's Snowflake may provide you with some inspiration: https://github.com/twitter/snowflake -Michael On Oct 28, 2012, at 9:16 PM, David Parks wrote: I need a unique & permanent ID assigned to new item encountered, which has a constraint that it is in the range of, let=92s say for simple discussion, one to one million. I suppose I could assign a range of usable IDs to each reduce task (where ID=92s are assigned) and keep those organized somehow at the end of the job= , but this seems clunky too. Since this is on AWS, zookeeper is not a good option. I thought it was part of the hadoop cluster (and thus easy to access), but guess I was wrong there. I would think that such a service would run most logically on the taskmaster server. I=92m surprised this isn=92t a common issue. I guess I c= ould launch a separate job that runs such a sequence service perhaps. But that= =92s non trivial its self with failure concerns. Perhaps there=92s just a better way of thinking of this? *From:* Ted Dunning [mailto:tdunning@maprtech.com ] *Sent:* Saturday, October 27, 2012 12:23 PM *To:* user@hadoop.apache.org *Subject:* Re: Cluster wide atomic operations This is better asked on the Zookeeper lists. The first answer is that global atomic operations are a generally bad idea. The second answer is that if you an batch these operations up then you can cut the evilness of global atomicity by a substantial factor. Are you sure you need a global counter? On Fri, Oct 26, 2012 at 11:07 PM, David Parks wrote: How can we manage cluster-wide atomic operations? Such as maintaining an auto-increment counter. Does Hadoop provide native support for these kinds of operations? An in case ultimate answer involves zookeeper, I'd love to work out doing this in AWS/EMR. --e89a8f3b9b5fc7617d04cd28a818 Content-Type: text/html; charset=windows-1252 Content-Transfer-Encoding: quoted-printable
Twitter's Snowflake may provid= e you with some inspiration:

https://github.com/twitter/snowflake

-Michael

On Oct 28, 2012, at 9:16 PM, David Parks &l= t;davidparks21@yahoo.com> = wrote:

I need a unique & permanent ID assigned to new item enc= ountered, which has a constraint that it is in the range of, let=92s say fo= r simple discussion, one to one million.

=A0

I suppose I could assign a range of = usable IDs to each reduce task (where ID=92s are assigned) and keep those o= rganized somehow at the end of the job, but this seems clunky too.

=A0

Since this is on AWS, zookeeper is n= ot a good option. I thought it was part of the hadoop cluster (and thus eas= y to access), but guess I was wrong there.

=A0

I would think that such a service wo= uld run most logically on the taskmaster server. I=92m surprised this isn= =92t a common issue. I guess I could launch a separate job that runs such a= sequence service perhaps. But that=92s non trivial its self with failure c= oncerns.

=A0

Perhaps there=92s just a better way = of thinking of this?

=A0

=A0

From: Ted Dunn= ing [mailto:tdunning@maprtech.com<= /a>]
Sent: Saturday, October 27, 2012 12:23 PM
To:
user@hadoop.apache.org
Subject:= Re: Cluster wide atomic operations

=A0

This is better asked on the Zookeeper lists.

=A0

The first answer is that global a= tomic operations are a generally bad idea.

=A0

The second answer is that if you a= n batch these operations up then you can cut the evilness of global atomici= ty by a substantial factor.

=A0

Are you sure you need= a global counter?

On Fri, Oct 26, 2012 at 1= 1:07 PM, David Parks <davidparks21@yahoo.com> wrote:

How can we manage clu= ster-wide atomic operations? Such as maintaining an
auto-increment count= er.

Does Hadoop provide native support for these kinds of operations= ?

An in case ultimate answer involves zookeeper, I'd love to work out= doing
this in AWS/EMR.

=A0

--e89a8f3b9b5fc7617d04cd28a818--