hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hong Tang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2151) [rumen] Add a map of jobconf key-value pairs in LoggedJob
Date Tue, 26 Oct 2010 17:58:21 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925063#action_12925063

Hong Tang commented on MAPREDUCE-2151:

bq. I think Hong is referring to configuration parameters that are likely to modify the behaviour
of the job and tasks (e.g mapred.child.* , mapreduce.job.* etc).

No, this is not what this jira intends to solve. But this jira could potentially help. Currently
Rumen extracts from jobconf.xml some key-values specific to map-reduce layer, and converts
them to regular primitive types. I think the extraction of mapred.child.* and mapreduce.job.*
etc should continue along this path.

However, we start to think of using Rumen output to analyze performance of frameworks on top
of map-reduce. One example is Pig. Pig will add more information in jobconf.xml to describe
the features being used, and compile-time statistics, We need to have a mechanism in Rumen
to retain such information in an extensible way, and is the primary purpose of this jira.

bq. Also *-default.xml might not be available for reference comparison. 
Correct. That is the main reason we have to make each parsed LoggedJob instance self-contained.

bq. Hmm. But I guess we need to bring in more and more configuration properties soon.
Yes, it will be,  but not unbounded. I think we can support extraction of properties based
on exact match or prefixes.

bq. Created MAPREDUCE-2153 to get other needed configuration properties in to the trace file.

This seems to be in addition to MAPREDUCE-1658. I suggest you roll two jiras into one (closing
MR-1658 and roll the work int oMR-2153).

bq. Also created MAPREDUCE-2152 for avoiding TraceBuilder's its own handling of deprecated
configuration properties in favour of Configuration object.
The purpose of this jira is to extend the set of key-values to be extracted by jobconf parser
and retain them as-is in LoggedJob object. So I believe your point is relatively orthogonal
to this jira. FWIW, I am a bit concerned to introduce this dependency between Rumen and MapReduce
because I think the handling deprecated conf parameters is not really a core part of MapReduce
API and could be dropped in the future (which would lead us to move the code into Rumen -
similar to the case of Pre21JobHistoryConstants).

> [rumen] Add a map of jobconf key-value pairs in LoggedJob
> ---------------------------------------------------------
>                 Key: MAPREDUCE-2151
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2151
>             Project: Hadoop Map/Reduce
>          Issue Type: New Feature
>          Components: tools/rumen
>            Reporter: Hong Tang
> It'd be useful to retain application level configuration settings in LoggedJob.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message