hadoop-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ranjit Mathew <ran...@yahoo-inc.com>
Subject Re: Listing Hadoop Job History Statistics
Date Tue, 17 Aug 2010 04:29:42 GMT
[BCC-ing "general" - again.]

On Tuesday 17 August 2010 07:36 AM, Scott Whitecross wrote:
> Thanks for the answers Doug and Arun.   I'm assuming the job-history files
> mentioned are in ./hadoop-0.20/logs/history/done/.  The files look like they
> were serialized by a class in Hadoop?  (If I can read the files back into
> the appropriate class, and then dump them out into a custom format, that'd
> be great.)

Rumen (src/tools/org/apache/hadoop/tools/rumen/) parses Job History files
and creates JSON files that can be either be loaded independently, or via
the API provided by Rumen itself. As an added benefit, it abstracts away
the differences between the 0.20.xx format and the Avro-based format used
in trunk.

There is not much documentation on Rumen right now, but MAPREDUCE-1918
(https://issues.apache.org/jira/browse/MAPREDUCE-1918) attempts to fix


> On Thu, Aug 12, 2010 at 12:52 AM, Arun C Murthy<acm@yahoo-inc.com>  wrote:
>> Moving to mapreduce-user@, bcc general@.
>> There isn't a direct way. One possible option is just use the per-job
>> job-history file which is on HDFS (See
>> http://hadoop.apache.org/common/docs/r0.20.0/mapred_tutorial.html#Job+Submission+and+Monitoringfor
info on job-history).
>> Hope that helps.
>> Arun
>> On Aug 11, 2010, at 8:54 AM, Scott Whitecross wrote:
>>   Hi -
>>> What's the best way to list and query information on Hadoop job histories?
>>> For example, I'd like to see the job names from the past week against a
>>> Hadoop cluster I'm using.   I don't see an API call or a way through the
>>> command line to pull the information.  Is the best way writing a quick
>>> script to process the job history files?
>>> Thanks.
>>> Scott

View raw message