hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amareshwari Sriramadasu <amar...@yahoo-inc.com>
Subject Re: JobTracker History data+analysis
Date Mon, 28 Jul 2008 06:42:07 GMT
HistoryViewer is used in JobClient to view the history files in the 
directory provided on the command line. The command is
$ bin/hadoop job -history <history-dir>  #by default history is stored 
in output dir.
outputDir in the constructor of HistoryViewer is the directory passed on 
the command-line.

You can specify a location to store the history files of a particular 
job using "hadoop.job.history.user.location". If nothing is specified, 
the logs are stored in the job's
output directory i.e. "mapred.output.dir". The files are stored in 
"_logs/history/" inside the directory.

Paco NATHAN wrote:
> Thank you, Amareshwari -
> That helps.  Hadn't noticed HistoryViewer before. It has no JavaDoc.
> What is a typical usage?  In other words, what would be the
> "outputDir" value in the context of ToolRunner, JobClient, etc. ?
> Paco
> On Sun, Jul 27, 2008 at 11:48 PM, Amareshwari Sriramadasu
> <amarsri@yahoo-inc.com> wrote:
>> Can you have a look at org.apache.hadoop.mapred.HistoryViewer and see if it
>> make sense?
>> Thanks
>> Amareshwari
>> Paco NATHAN wrote:
>>> We have a need to access data found in the JobTracker History link.
>>> Specifically in the "Analyse This Job" analysis. Must be run in Java,
>>> between jobs, in the same code which calls ToolRunner and JobClient.
>>> In essence, we need to collect descriptive statistics about task
>>> counts and times for map, shuffle, reduce.
>>> After tracing the flow of the JSP in "src/webapps/job"...  Is there a
>>> better way to get at this data, *not* from the web UI perspective but
>>> from the code?
>>> Tried to find any applicable patterns in JobTracker, ClusterStatus,
>>> JobClient, etc., but no joy.
>>> Thanks,
>>> Paco

View raw message