Return-Path: X-Original-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-dev-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 6D6E29A68 for ; Tue, 6 Dec 2011 08:20:03 +0000 (UTC) Received: (qmail 24693 invoked by uid 500); 6 Dec 2011 08:20:01 -0000 Delivered-To: apmail-hadoop-mapreduce-dev-archive@hadoop.apache.org Received: (qmail 23896 invoked by uid 500); 6 Dec 2011 08:20:00 -0000 Mailing-List: contact mapreduce-dev-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: mapreduce-dev@hadoop.apache.org Delivered-To: mailing list mapreduce-dev@hadoop.apache.org Received: (qmail 23878 invoked by uid 99); 6 Dec 2011 08:19:59 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2011 08:19:59 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of praveensripati@gmail.com designates 74.125.83.48 as permitted sender) Received: from [74.125.83.48] (HELO mail-ee0-f48.google.com) (74.125.83.48) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 06 Dec 2011 08:19:54 +0000 Received: by eeab20 with SMTP id b20so4484343eea.35 for ; Tue, 06 Dec 2011 00:19:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=GeaC2Jbz+Q/yLEFrmORbMNkE4QUP7G9OGni6SmvGbLI=; b=asu8+OGvYjWAzUWbh6Tj8QZBMpVh1O5s9dkZnPqszUPAjTpT1gSQNPEkYZv4q7Hzi7 dtv4nmGi9kdHXZYzLnspIlFZBj2XWkobBbLP2Wctkf4VxwkpEIEUlBedJL4JwSWAiFQ6 yku7kyFr1Deqxx4/QrMULpvHnIdolS1RpXiCA= MIME-Version: 1.0 Received: by 10.14.15.207 with SMTP id f55mr2133850eef.6.1323159572433; Tue, 06 Dec 2011 00:19:32 -0800 (PST) Received: by 10.204.39.131 with HTTP; Tue, 6 Dec 2011 00:19:32 -0800 (PST) In-Reply-To: <1E05E9C0-DF59-4590-8C4F-DFE19808F088@cloudera.com> References: <1E05E9C0-DF59-4590-8C4F-DFE19808F088@cloudera.com> Date: Tue, 6 Dec 2011 13:49:32 +0530 Message-ID: Subject: Re: Automatically Documenting Apache Hadoop Configuration From: Praveen Sripati To: cdh-dev@cloudera.org Cc: mapreduce-dev@hadoop.apache.org, "common-dev@hadoop.apache.org" Content-Type: multipart/alternative; boundary=0016e65c8f5c4bcdc004b36818b1 --0016e65c8f5c4bcdc004b36818b1 Content-Type: text/plain; charset=ISO-8859-1 Hi, > From my work on yarn trying to document the configs there and to standardize them, writing anything that is going to automatically detect config values through static analysis is going to be very difficult. This is because most of the configs in yarn are now built up using static string concatenation. All the references to Configuration.get* methods will give the list of parameters from which the unique ones have to be picked and the literal string mapped (like dfs.namenode.safemode.threshold-pct for DFS_NAMENODE_SAFEMODE_THRESHOLD_PCT_KEY). We could also add some annotations to the configuraion parameters, which would be included in the documentation. We can take a crack at it. If the parameters come out accurately then html can be generated automatically similar to javadocs or else all the newly added parameters will be written to a file which will be an input to the RM (or someone else) to open JIRAs and fix them. > I do not know if we recommend using config strings directly when there's an API in Job/JobConf supporting setting the same thing. Changing the parameter through API will lead to building and packaging multiple times. Also, setting the parameters from the command prompt will make testing easier. Ari from Cloudera and the author of the article mentioned in a separate mail that he would release the code, once it's done I will look into it. Regards, Praveen On Tue, Dec 6, 2011 at 12:52 AM, Harsh J wrote: > I've seen Oozie do that same break-up of config param names and boy, its > difficult to grep in such a code base when troubleshooting. > > OTOH, we at least get a sane prefix for relevant config names (hope we do?) > > On 06-Dec-2011, at 12:44 AM, Robert Evans wrote: > > > From my work on yarn trying to document the configs there and to > standardize them, writing anything that is going to automatically detect > config values through static analysis is going to be very difficult. This > is because most of the configs in yarn are now built up using static string > concatenation. > > > > public static String BASE = "yarn.base."; > > public static String CONF = BASE+"config"; > > > > I am not sure that there is a good way around this short of using a full > java parser to trace out all method calls, and try to resolve the > parameters. I know this is possible, just not that simple to do. > > > > I am +1 for anything that will clean up configs and improve the > documentation of them. Even if we have to rewire or rewrite a lot of the > Configuration class to make things work properly. > > > > --Bobby Evans > > > > On 12/5/11 11:54 AM, "Harsh J" wrote: > > > > Praveen, > > > > (Inline.) > > > > On 05-Dec-2011, at 10:14 PM, Praveen Sripati wrote: > > > >> Hi, > >> > >> Recently there was a query about the Hadoop framework being tolerant for > >> map/reduce task failure towards the job completion. And the solution > was to > >> set the 'mapreduce.map.failures.maxpercent` and > >> 'mapreduce.reduce.failures.maxpercent' properties. Although this feature > >> was introduced couple of years back, it was not documented. Had similar > >> experience with 0.23 release also. > > > > I do not know if we recommend using config strings directly when there's > an API in Job/JobConf supporting setting the same thing. Just saying - that > there was javadoc already available on this. But of course, it would be > better if the tutorial covered this too. Doc-patches welcome! > > > >> It would be really good for Hadoop adoption to automatically dig and > >> document all the existing configurable properties in Hadoop and also to > >> identify newly added properties in a particular release during the build > >> processes. Documentation would also lead to fewer queries in the forums. > >> Cloudera has done something similar [1], though it's not 100% accurate, > it > >> would definitely help to some extent. > > > > I'm +1 for this. We do request and consistently add entries to > *-default.xml files if we find them undocumented today. I think we should > also enforce it at the review level, so that patches do not go in > undocumented -- at minimum the configuration tweaks at least. > > > > --0016e65c8f5c4bcdc004b36818b1--