Return-Path: X-Original-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 9031D979C for ; Fri, 5 Oct 2012 17:41:23 +0000 (UTC) Received: (qmail 46622 invoked by uid 500); 5 Oct 2012 17:41:18 -0000 Delivered-To: apmail-hadoop-hdfs-user-archive@hadoop.apache.org Received: (qmail 46528 invoked by uid 500); 5 Oct 2012 17:41:18 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 46521 invoked by uid 99); 5 Oct 2012 17:41:18 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Oct 2012 17:41:18 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: 209.85.223.176 is neither permitted nor denied by domain of jeremy@lewi.us) Received: from [209.85.223.176] (HELO mail-ie0-f176.google.com) (209.85.223.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 05 Oct 2012 17:41:12 +0000 Received: by mail-ie0-f176.google.com with SMTP id k11so5268662iea.35 for ; Fri, 05 Oct 2012 10:40:50 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:x-originating-ip:in-reply-to:references:date :message-id:subject:from:to:content-type:x-gm-message-state; bh=aAWMnMWgK25CnHqmrho8ylwKGGCW/KPgkDr+kOpdmX0=; b=YslQAHemVOuOTTFAiPJ+7Mc/qwXGYrDmfhg+bo27eqDcwFmp/xBBWVDhHZtCB6GzAK ZWlcsMM9//GEzpGxsWwfdLZDG50ooqQlAL2tOzDTzFxt23GgSaOTz2sjn50JxxtmulsR j62gAkr3jGSYzFef76TDIMriJwJg5sqo+pLOJ75tOx9mp5vUftNQoHt7ja9oZRYUrTJ0 Sgi0QFLrUDZI9/PMJAnNGL/PMEofVDdEAlbttqRsPjS0EypUECvEvNobbLLlIqeAkGgT nykEuZi0VrhyabaKZ9VZs71DpGPFdun8VSBHag+KA6WbW9WLtCdpkHW8o2xtjdQ7UoE0 uhbw== MIME-Version: 1.0 Received: by 10.50.190.230 with SMTP id gt6mr1825841igc.49.1349458850439; Fri, 05 Oct 2012 10:40:50 -0700 (PDT) Received: by 10.64.128.68 with HTTP; Fri, 5 Oct 2012 10:40:50 -0700 (PDT) X-Originating-IP: [2620:0:1006:1:be30:5bff:fed1:175d] In-Reply-To: References: Date: Fri, 5 Oct 2012 10:40:50 -0700 Message-ID: Subject: Re: Counters that track the max value From: Jeremy Lewi To: user@hadoop.apache.org Content-Type: multipart/alternative; boundary=14dae93406bb6b631404cb535fdf X-Gm-Message-State: ALoCoQlDJD88GM+hsXA5MqOXpNN47/5IrsNOF6+ULHTcjKcVBqELhl3X4Nms++OxuVyfcTg6lsxF --14dae93406bb6b631404cb535fdf Content-Type: text/plain; charset=ISO-8859-1 Done. https://issues.apache.org/jira/browse/MAPREDUCE-4709 Thanks J On Fri, Oct 5, 2012 at 10:13 AM, Harsh J wrote: > Jeremy, > > I suppose thats doable, please file a MAPREDUCE JIRA so you can > discuss this with others on the development side as well. > > I am guessing that MAX operations of most of the user-oriented data > flow front-ends such as Hive and Pig already do this efficiently, so > perhaps there hasn't been a very strong need for this. > > On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi wrote: > > HI Harsh, > > > > Thank you very much that will work. > > > > How come we can't simply create a modification of a regular mapreduce > > counter which does this behind the scenes? It seems like we should just > be > > able to replace "+" with "max" and everything else should work? > > > > J > > > > > > On Wed, Oct 3, 2012 at 9:52 AM, Harsh J wrote: > >> > >> Jeremy, > >> > >> Here's my shot at it (pardon the quick crappy code): > >> https://gist.github.com/3828246 > >> > >> Basically - you can achieve it in two ways: > >> > >> Requirement: All tasks must increment the "max" designated counter > >> only AFTER the max has been computed (i.e. in cleanup). > >> > >> 1. All tasks may use same counter name. Later, we pull per-task > >> counters and determine the max at the client. (This is my quick and > >> dirty implementation) > >> 2. All tasks may use their own task ID (Number part) in the counter > >> name, but use the same group. Later, we fetch all counters for that > >> group and iterate over it to find the max. This is cleaner, and > >> doesn't end up using deprecated APIs such as the above. > >> > >> Does this help? > >> > >> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi wrote: > >> > HI hadoop-users, > >> > > >> > I'm curious if there is an implementation somewhere of a counter which > >> > tracks the maximum of some value across all mappers or reducers? > >> > > >> > Thanks > >> > J > >> > >> > >> > >> -- > >> Harsh J > > > > > > > > -- > Harsh J > --14dae93406bb6b631404cb535fdf Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Done.

Thanks
J

On Fri, Oct 5, = 2012 at 10:13 AM, Harsh J <harsh@cloudera.com> wrote:
Jeremy,

I suppose thats doable, please file a MAPREDUCE JIRA so you can
discuss this with others on the development side as well.

I am guessing that MAX operations of most of the user-oriented data
flow front-ends such as Hive and Pig already do this efficiently, so
perhaps there hasn't been a very strong need for this.

On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <jeremy@lewi.us> wrote:
> HI Harsh,
>
> Thank you very much that will work.
>
> How come we can't simply create a modification of a regular mapred= uce
> counter which does this behind the scenes? It seems like we should jus= t be
> able to replace "+" with "max" and everything else= should work?
>
> J
>
>
> On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <harsh@cloudera.com> wrote:
>>
>> Jeremy,
>>
>> Here's my shot at it (pardon the quick crappy code):
>> http= s://gist.github.com/3828246
>>
>> Basically - you can achieve it in two ways:
>>
>> Requirement: =A0All tasks must increment the "max" desig= nated counter
>> only AFTER the max has been computed (i.e. in cleanup).
>>
>> 1. All tasks may use same counter name. Later, we pull per-task >> counters and determine the max at the client. (This is my quick an= d
>> dirty implementation)
>> 2. All tasks may use their own task ID (Number part) in the counte= r
>> name, but use the same group. Later, we fetch all counters for tha= t
>> group and iterate over it to find the max. This is cleaner, and >> doesn't end up using deprecated APIs such as the above.
>>
>> Does this help?
>>
>> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <jeremy@lewi.us> wrote:
>> > HI hadoop-users,
>> >
>> > I'm curious if there is an implementation somewhere of a = counter which
>> > tracks the maximum of some value across all mappers or reduce= rs?
>> >
>> > Thanks
>> > J
>>
>>
>>
>> --
>> Harsh J
>
>



--
Harsh J

--14dae93406bb6b631404cb535fdf--