Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: 209.85.223.176 is neither permitted
 nor denied by domain of jeremy@lewi.us)
MIME-Version: 1.0
In-Reply-To: 
 <CAOcnVr3RDsDY4jq27TF+7c370aRYXhvwXr0ywq=W1U1vhmYn+g@mail.gmail.com>
References: 
 <CACijwibU-UeZTZfYtHPRPi_JF3m3_BUyUPPdk_1XwZKJ2i5GeA@mail.gmail.com>
	<CAOcnVr0Cv04SundCUVJk+DyDijUbj_XyVpieYk7mjDB5EpTWMw@mail.gmail.com>
	<CACijwiZLii6+_2RqV=nexK-v3bP2_ef2L1uS7A87e3duHRRC=A@mail.gmail.com>
	<CAOcnVr3RDsDY4jq27TF+7c370aRYXhvwXr0ywq=W1U1vhmYn+g@mail.gmail.com>
Date: Fri, 5 Oct 2012 10:40:50 -0700
Message-ID: 
 <CACijwiaKjRd-z1NxmxZ2GQGrB5VwRkD1OvpYJxSt+DBwbS3ttg@mail.gmail.com>
Subject: Re: Counters that track the max value
From: Jeremy Lewi <jeremy@lewi.us>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=14dae93406bb6b631404cb535fdf

--14dae93406bb6b631404cb535fdf
Content-Type: text/plain; charset=ISO-8859-1

Done.
https://issues.apache.org/jira/browse/MAPREDUCE-4709

Thanks
J

On Fri, Oct 5, 2012 at 10:13 AM, Harsh J <harsh@cloudera.com> wrote:

> Jeremy,
>
> I suppose thats doable, please file a MAPREDUCE JIRA so you can
> discuss this with others on the development side as well.
>
> I am guessing that MAX operations of most of the user-oriented data
> flow front-ends such as Hive and Pig already do this efficiently, so
> perhaps there hasn't been a very strong need for this.
>
> On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi <jeremy@lewi.us> wrote:
> > HI Harsh,
> >
> > Thank you very much that will work.
> >
> > How come we can't simply create a modification of a regular mapreduce
> > counter which does this behind the scenes? It seems like we should just
> be
> > able to replace "+" with "max" and everything else should work?
> >
> > J
> >
> >
> > On Wed, Oct 3, 2012 at 9:52 AM, Harsh J <harsh@cloudera.com> wrote:
> >>
> >> Jeremy,
> >>
> >> Here's my shot at it (pardon the quick crappy code):
> >> https://gist.github.com/3828246
> >>
> >> Basically - you can achieve it in two ways:
> >>
> >> Requirement:  All tasks must increment the "max" designated counter
> >> only AFTER the max has been computed (i.e. in cleanup).
> >>
> >> 1. All tasks may use same counter name. Later, we pull per-task
> >> counters and determine the max at the client. (This is my quick and
> >> dirty implementation)
> >> 2. All tasks may use their own task ID (Number part) in the counter
> >> name, but use the same group. Later, we fetch all counters for that
> >> group and iterate over it to find the max. This is cleaner, and
> >> doesn't end up using deprecated APIs such as the above.
> >>
> >> Does this help?
> >>
> >> On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi <jeremy@lewi.us> wrote:
> >> > HI hadoop-users,
> >> >
> >> > I'm curious if there is an implementation somewhere of a counter which
> >> > tracks the maximum of some value across all mappers or reducers?
> >> >
> >> > Thanks
> >> > J
> >>
> >>
> >>
> >> --
> >> Harsh J
> >
> >
>
>
>
> --
> Harsh J
>

--14dae93406bb6b631404cb535fdf
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Done.<div><a href=3D"https://issues.apache.org/jira/browse/MAPREDUCE-4709">=
https://issues.apache.org/jira/browse/MAPREDUCE-4709</a></div><div><br></di=
v><div>Thanks</div><div>J<br><br><div class=3D"gmail_quote">On Fri, Oct 5, =
2012 at 10:13 AM, Harsh J <span dir=3D"ltr">&lt;<a href=3D"mailto:harsh@clo=
udera.com" target=3D"_blank">harsh@cloudera.com</a>&gt;</span> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1p=
x #ccc solid;padding-left:1ex">Jeremy,<br>
<br>
I suppose thats doable, please file a MAPREDUCE JIRA so you can<br>
discuss this with others on the development side as well.<br>
<br>
I am guessing that MAX operations of most of the user-oriented data<br>
flow front-ends such as Hive and Pig already do this efficiently, so<br>
perhaps there hasn&#39;t been a very strong need for this.<br>
<div class=3D"HOEnZb"><div class=3D"h5"><br>
On Fri, Oct 5, 2012 at 9:18 PM, Jeremy Lewi &lt;<a href=3D"mailto:jeremy@le=
wi.us">jeremy@lewi.us</a>&gt; wrote:<br>
&gt; HI Harsh,<br>
&gt;<br>
&gt; Thank you very much that will work.<br>
&gt;<br>
&gt; How come we can&#39;t simply create a modification of a regular mapred=
uce<br>
&gt; counter which does this behind the scenes? It seems like we should jus=
t be<br>
&gt; able to replace &quot;+&quot; with &quot;max&quot; and everything else=
 should work?<br>
&gt;<br>
&gt; J<br>
&gt;<br>
&gt;<br>
&gt; On Wed, Oct 3, 2012 at 9:52 AM, Harsh J &lt;<a href=3D"mailto:harsh@cl=
oudera.com">harsh@cloudera.com</a>&gt; wrote:<br>
&gt;&gt;<br>
&gt;&gt; Jeremy,<br>
&gt;&gt;<br>
&gt;&gt; Here&#39;s my shot at it (pardon the quick crappy code):<br>
&gt;&gt; <a href=3D"https://gist.github.com/3828246" target=3D"_blank">http=
s://gist.github.com/3828246</a><br>
&gt;&gt;<br>
&gt;&gt; Basically - you can achieve it in two ways:<br>
&gt;&gt;<br>
&gt;&gt; Requirement: =A0All tasks must increment the &quot;max&quot; desig=
nated counter<br>
&gt;&gt; only AFTER the max has been computed (i.e. in cleanup).<br>
&gt;&gt;<br>
&gt;&gt; 1. All tasks may use same counter name. Later, we pull per-task<br=
>
&gt;&gt; counters and determine the max at the client. (This is my quick an=
d<br>
&gt;&gt; dirty implementation)<br>
&gt;&gt; 2. All tasks may use their own task ID (Number part) in the counte=
r<br>
&gt;&gt; name, but use the same group. Later, we fetch all counters for tha=
t<br>
&gt;&gt; group and iterate over it to find the max. This is cleaner, and<br=
>
&gt;&gt; doesn&#39;t end up using deprecated APIs such as the above.<br>
&gt;&gt;<br>
&gt;&gt; Does this help?<br>
&gt;&gt;<br>
&gt;&gt; On Wed, Oct 3, 2012 at 8:47 PM, Jeremy Lewi &lt;<a href=3D"mailto:=
jeremy@lewi.us">jeremy@lewi.us</a>&gt; wrote:<br>
&gt;&gt; &gt; HI hadoop-users,<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; I&#39;m curious if there is an implementation somewhere of a =
counter which<br>
&gt;&gt; &gt; tracks the maximum of some value across all mappers or reduce=
rs?<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt; Thanks<br>
&gt;&gt; &gt; J<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt;<br>
&gt;&gt; --<br>
&gt;&gt; Harsh J<br>
&gt;<br>
&gt;<br>
<br>
<br>
<br>
</div></div><span class=3D"HOEnZb"><font color=3D"#888888">--<br>
Harsh J<br>
</font></span></blockquote></div><br></div>

--14dae93406bb6b631404cb535fdf--