hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bart Vandewoestyne <Bart.Vandewoest...@telenet.be>
Subject getting counters from specific hadoop jobs
Date Thu, 23 Oct 2014 11:32:57 GMT
Hello list,

I order to learn about Hadoop performance tuning, I am currently 
investigating the effect of certain Hadoop configuration parameters on 
certain Hadoop counters.  I would like to do something like the 
following (from the command line):

for some_config_parameter in set_of_config_values

   Step 1) run hadoop job with 'hadoop jar ....'

   Step 2) once job finished, get the value of one or more Hadoop 
counters of this job

I know that I can achieve step 2 with the -counter option of the mapred 
job command:

bart@sandy-quad-1:~$ mapred job -counter
Usage: CLI [-counter <job-id> <group-name> <counter-name>]

However, I need to specify a job-id here, and that is where I'm having 
trouble... I don't know an easy way to get the job-id from the hadoop 
job that I started in Step 1.  I also don't know of a way to specify a 
job-id myself in Step 1 so that I can use it later in Step 2.

I cannot imagine I'm the only one trying to run jobs and requesting some 
of the counters afterwards.  How is this typically solved?

Note that I'm looking for a command-line solution, something that is 
scriptable bash or so.


View raw message