hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1918) Add documentation to Rumen
Date Wed, 21 Jul 2010 17:54:52 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12890795#action_12890795

Hong Tang commented on MAPREDUCE-1918:

I think we should also describe (1) the Json objects are created through Jackson ObjectMapper
from LoggedXXX classes; (2)  the API interface how to build LoggedXXX objects, and how to
read them.

The basic API flow for creating parsed rumen object is as follows (user's responsibility of
creating input streams from job conf xml and job history logs):
- JobConfigurationParser: parser that parses job conf xml. One instance can be reused to parse
many job conf xml files.
	JobConfigurationParser jcp = new JobConfigurationParser(interestedProperties); // interestedProperties
is a a list of keys to be extracted from the job conf xml file.
	Properties parsedProperties = jcp.parse(inputStream); // inputStream is the file input stream
for the job conf xml file.
- JobHistoryParser: parser that parses job history files. It is an interface and actual implementations
are defined as enums in JobHistoryParserFactory. One can directly use the version matching
the the version of job history logs. Or she can also use method "canParse()" to detect which
parser is suitable for parsing the job history logs (following the pattern in TraceBuilder).
Create one instance to parse a job history log and close it after use.
	JobHistoryParser parser = new Hadoop20JHParser(inputStream); // inputStream is the file input
stream for the job history file.
	// JobHistoryParser APIs will be used later when being fed into JobBuilder (below).

- JobBuilder: builder for LoggedJobs. Create one instance to parse the pairing job history
log and job conf. The order of parsing conf file or job history file is not important.
	JobBuilder jb = new JobBuilder(jobID); // you will need to extract the job ID from the file
name: <jobtracker>_job_<timestamp>_<sequence>
	JobHistoryParser parser = new Hadoop20JHParser(jobHistoryInputStream);
	try {
		HistoryEvent e;
		while ((e = parser.nextEvent()) != null) {
	} finally {
	LoggedJob job = jb.build();

>From the reading side, the output produced by TraceBuilder or Folder can be read through
JobTraceReader or ClusterTopologyReader. One can also use Jackson's ObjectMapper to parse
the json formatted data into LoggedJob or LoggedTopology objects.

> Add documentation to Rumen
> --------------------------
>                 Key: MAPREDUCE-1918
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1918
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: tools/rumen
>    Affects Versions: 0.22.0
>            Reporter: Amar Kamat
>            Assignee: Amar Kamat
>             Fix For: 0.22.0
>         Attachments: mapreduce-1918-v1.3.patch, mapreduce-1918-v1.4.patch, rumen.pdf,
> Add forrest documentation to Rumen tool.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message