hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paco NATHAN" <cet...@gmail.com>
Subject Re: JobTracker History data+analysis
Date Tue, 29 Jul 2008 05:33:31 GMT
Thanks Amareshwari -

That could be quite useful to access summary analysis from within the code.

Currently this is not written as a public class, which makes it
difficult to use inside application code.

Are there plans to make it a public class?


Paco


On Mon, Jul 28, 2008 at 1:42 AM, Amareshwari Sriramadasu
<amarsri@yahoo-inc.com> wrote:
> HistoryViewer is used in JobClient to view the history files in the
> directory provided on the command line. The command is
> $ bin/hadoop job -history <history-dir>  #by default history is stored in
> output dir.
> outputDir in the constructor of HistoryViewer is the directory passed on the
> command-line.
>
> You can specify a location to store the history files of a particular job
> using "hadoop.job.history.user.location". If nothing is specified, the logs
> are stored in the job's
> output directory i.e. "mapred.output.dir". The files are stored in
> "_logs/history/" inside the directory.
> Thanks
> Amareshwari
>
> Paco NATHAN wrote:
>>
>> Thank you, Amareshwari -
>>
>> That helps.  Hadn't noticed HistoryViewer before. It has no JavaDoc.
>>
>> What is a typical usage?  In other words, what would be the
>> "outputDir" value in the context of ToolRunner, JobClient, etc. ?
>>
>> Paco
>>
>>
>> On Sun, Jul 27, 2008 at 11:48 PM, Amareshwari Sriramadasu
>> <amarsri@yahoo-inc.com> wrote:
>>
>>>
>>> Can you have a look at org.apache.hadoop.mapred.HistoryViewer and see if
>>> it
>>> make sense?
>>>
>>> Thanks
>>> Amareshwari
>>>
>>> Paco NATHAN wrote:
>>>
>>>>
>>>> We have a need to access data found in the JobTracker History link.
>>>> Specifically in the "Analyse This Job" analysis. Must be run in Java,
>>>> between jobs, in the same code which calls ToolRunner and JobClient.
>>>> In essence, we need to collect descriptive statistics about task
>>>> counts and times for map, shuffle, reduce.
>>>>
>>>> After tracing the flow of the JSP in "src/webapps/job"...  Is there a
>>>> better way to get at this data, *not* from the web UI perspective but
>>>> from the code?
>>>>
>>>> Tried to find any applicable patterns in JobTracker, ClusterStatus,
>>>> JobClient, etc., but no joy.
>>>>
>>>> Thanks,
>>>> Paco
>>>>
>>>>
>>>
>>>
>
>

Mime
View raw message