hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Robert Evans <ev...@yahoo-inc.com>
Subject Re: Automatically Documenting Apache Hadoop Configuration
Date Mon, 05 Dec 2011 19:14:44 GMT
>From my work on yarn trying to document the configs there and to standardize them, writing
anything that is going to automatically detect config values through static analysis is going
to be very difficult.  This is because most of the configs in yarn are now built up using
static string concatenation.

public static String BASE = "yarn.base.";
public static String CONF = BASE+"config";

I am not sure that there is a good way around this short of using a full java parser to trace
out all method calls, and try to resolve the parameters.  I know this is possible, just not
that simple to do.

I am +1 for anything that will clean up configs and improve the documentation of them.  Even
if we have to rewire or rewrite a lot of the Configuration class to make things work properly.

--Bobby Evans

On 12/5/11 11:54 AM, "Harsh J" <harsh@cloudera.com> wrote:



On 05-Dec-2011, at 10:14 PM, Praveen Sripati wrote:

> Hi,
> Recently there was a query about the Hadoop framework being tolerant for
> map/reduce task failure towards the job completion. And the solution was to
> set the 'mapreduce.map.failures.maxpercent` and
> 'mapreduce.reduce.failures.maxpercent' properties. Although this feature
> was introduced couple of years back, it was not documented. Had similar
> experience with 0.23 release also.

I do not know if we recommend using config strings directly when there's an API in Job/JobConf
supporting setting the same thing. Just saying - that there was javadoc already available
on this. But of course, it would be better if the tutorial covered this too. Doc-patches welcome!

> It would be really good for Hadoop adoption to automatically dig and
> document all the existing configurable properties in Hadoop and also to
> identify newly added properties in a particular release during the build
> processes. Documentation would also lead to fewer queries in the forums.
> Cloudera has done something similar [1], though it's not 100% accurate, it
> would definitely help to some extent.

I'm +1 for this. We do request and consistently add entries to *-default.xml files if we find
them undocumented today. I think we should also enforce it at the review level, so that patches
do not go in undocumented -- at minimum the configuration tweaks at least.

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message