Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@hadoop.apache.org
MIME-Version: 1.0
In-Reply-To: <5448E769.8090907@gmail.com>
References: <5448E769.8090907@gmail.com>
Date: Thu, 23 Oct 2014 15:11:32 +0200
Message-ID: 
 <CAH70K7664aSaU3Kn5uf1zecf7aVb05c2Y5R4GJrPMgscEg7CCA@mail.gmail.com>
Subject: Re: getting counters from specific hadoop jobs
From: Thomas Demoor <thomas.demoor@amplidata.com>
To: user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=001a1135ec94a6b4ae050616cd8f

--001a1135ec94a6b4ae050616cd8f
Content-Type: text/plain; charset=UTF-8

Hi Bart,

Dieter beat me to it. An alternative would be grepping from the logs.

Furthermore, if you write/alter the source code of the applications
yourself rather than using f.i. the examples included with Hadoop, you can
access the id though job.getJobId() once the job has been submitted and
process (print) it to your liking. More info on the Job interface:
http://hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Job_Submission_and_Monitoring

Good luck and nice to see Belgian academics with interest in Hadoop,
Thomas

Thomas Demoor
skype: demoor.thomas
mobile: +32 497883833

On Thu, Oct 23, 2014 at 1:32 PM, Bart Vandewoestyne <
Bart.Vandewoestyne@telenet.be> wrote:

> Hello list,
>
> I order to learn about Hadoop performance tuning, I am currently
> investigating the effect of certain Hadoop configuration parameters on
> certain Hadoop counters.  I would like to do something like the following
> (from the command line):
>
> for some_config_parameter in set_of_config_values
>
>   Step 1) run hadoop job with 'hadoop jar ....'
>
>   Step 2) once job finished, get the value of one or more Hadoop counters
> of this job
>
> I know that I can achieve step 2 with the -counter option of the mapred
> job command:
>
> bart@sandy-quad-1:~$ mapred job -counter
> Usage: CLI [-counter <job-id> <group-name> <counter-name>]
>
> However, I need to specify a job-id here, and that is where I'm having
> trouble... I don't know an easy way to get the job-id from the hadoop job
> that I started in Step 1.  I also don't know of a way to specify a job-id
> myself in Step 1 so that I can use it later in Step 2.
>
> I cannot imagine I'm the only one trying to run jobs and requesting some
> of the counters afterwards.  How is this typically solved?
>
> Note that I'm looking for a command-line solution, something that is
> scriptable bash or so.
>
> Thanks,
> Bart
>

--001a1135ec94a6b4ae050616cd8f
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Bart,<div><br></div><div>Dieter beat me to it. An alter=
native would be grepping from the logs.=C2=A0</div><div><br></div><div>Furt=
hermore, if you write/alter the source code of the applications yourself ra=
ther than using f.i. the examples included with Hadoop, you can access the =
id though job.getJobId() once the job has been submitted and process (print=
) it to your liking. More info on the Job interface:=C2=A0<a href=3D"http:/=
/hadoop.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-cl=
ient-core/MapReduceTutorial.html#Job_Submission_and_Monitoring">http://hado=
op.apache.org/docs/current/hadoop-mapreduce-client/hadoop-mapreduce-client-=
core/MapReduceTutorial.html#Job_Submission_and_Monitoring</a></div><div><br=
></div><div>Good luck and nice to see Belgian academics with interest<i>=C2=
=A0</i>in=C2=A0Hadoop,</div><div>Thomas</div></div><div class=3D"gmail_extr=
a"><br clear=3D"all"><div><div dir=3D"ltr"><div>Thomas Demoor</div><div><di=
v>skype: demoor.thomas</div><div>mobile:=C2=A0<a value=3D"+32474043819" sty=
le=3D"color:rgb(17,85,204)">+32 497883833</a></div></div><div><img src=3D"h=
ttp://amplidata.com/wp-content/uploads/2013/08/amplidata_logo_transp-hi-res=
.png" width=3D"200" height=3D"54"></div></div></div>
<br><div class=3D"gmail_quote">On Thu, Oct 23, 2014 at 1:32 PM, Bart Vandew=
oestyne <span dir=3D"ltr">&lt;<a href=3D"mailto:Bart.Vandewoestyne@telenet.=
be" target=3D"_blank">Bart.Vandewoestyne@telenet.be</a>&gt;</span> wrote:<b=
r><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:=
1px #ccc solid;padding-left:1ex">Hello list,<br>
<br>
I order to learn about Hadoop performance tuning, I am currently investigat=
ing the effect of certain Hadoop configuration parameters on certain Hadoop=
 counters.=C2=A0 I would like to do something like the following (from the =
command line):<br>
<br>
for some_config_parameter in set_of_config_values<br>
<br>
=C2=A0 Step 1) run hadoop job with &#39;hadoop jar ....&#39;<br>
<br>
=C2=A0 Step 2) once job finished, get the value of one or more Hadoop count=
ers of this job<br>
<br>
I know that I can achieve step 2 with the -counter option of the mapred job=
 command:<br>
<br>
bart@sandy-quad-1:~$ mapred job -counter<br>
Usage: CLI [-counter &lt;job-id&gt; &lt;group-name&gt; &lt;counter-name&gt;=
]<br>
<br>
However, I need to specify a job-id here, and that is where I&#39;m having =
trouble... I don&#39;t know an easy way to get the job-id from the hadoop j=
ob that I started in Step 1.=C2=A0 I also don&#39;t know of a way to specif=
y a job-id myself in Step 1 so that I can use it later in Step 2.<br>
<br>
I cannot imagine I&#39;m the only one trying to run jobs and requesting som=
e of the counters afterwards.=C2=A0 How is this typically solved?<br>
<br>
Note that I&#39;m looking for a command-line solution, something that is sc=
riptable bash or so.<br>
<br>
Thanks,<br>
Bart<br>
</blockquote></div><br></div>

--001a1135ec94a6b4ae050616cd8f--