Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 67056110CD for ; Wed, 10 Sep 2014 04:42:10 +0000 (UTC) Received: (qmail 88488 invoked by uid 500); 10 Sep 2014 04:42:01 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 88376 invoked by uid 500); 10 Sep 2014 04:42:01 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 87829 invoked by uid 99); 10 Sep 2014 04:42:01 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 04:42:01 +0000 X-ASF-Spam-Status: No, hits=0.3 required=5.0 tests=FREEMAIL_REPLY,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of skgadalay@gmail.com designates 74.125.82.43 as permitted sender) Received: from [74.125.82.43] (HELO mail-wg0-f43.google.com) (74.125.82.43) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 10 Sep 2014 04:41:55 +0000 Received: by mail-wg0-f43.google.com with SMTP id x12so4720202wgg.14 for ; Tue, 09 Sep 2014 21:41:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=mtEoVruaWiBG5zRZmp9u8X+vTprhxyoLKoNr7MrVL2Y=; b=Wb3Dk8ImAgxW/2DcXpQa/mQYcygiXOLxhxEPNsJSDeRyZ8IbLK6jGEL8zyFXxxJPfA D7PeGfjfCnt1lzGJmbxc3Ek4UDT1H4zIYP3ZCz3VtL1xB7rUOF2YNtfYkqm70LIrWdV4 Rmum8TyajYf20NVou45s+cqE9YHXwHHWBc3uLgqRb75vfj/jld7MP61+3e0d91aku+PO RLIAJibX7bC8aMGdxAzoLTyPHEF6W8F3AXmp4Mk2+F8rV0LG2tFSeC8fN9iyINXOdXvN 6tI5eMZjT201TTw+ueq/B+gQ3i8ST6pynvXu6r21rble9NB3Dgl+UAITN7iVtBWUQlkL neXg== MIME-Version: 1.0 X-Received: by 10.180.78.234 with SMTP id e10mr34725787wix.7.1410324094583; Tue, 09 Sep 2014 21:41:34 -0700 (PDT) Received: by 10.217.99.199 with HTTP; Tue, 9 Sep 2014 21:41:34 -0700 (PDT) In-Reply-To: References: Date: Wed, 10 Sep 2014 10:11:34 +0530 Message-ID: Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly From: Susheel Kumar Gadalay To: user@hadoop.apache.org Content-Type: text/plain; charset=UTF-8 X-Virus-Checked: Checked by ClamAV on apache.org I am interested in job related configuration properties. I have a mix of EC2 instance types - m1.small and m1.medium. I am not clear which properties are server side and which are client side in the mapred-site.xml and yarn-site.xml. I have edited the resource manager node (m1.medium EC2 instance) and gave yarn.app.mapreduce.am.resource.mb=256 (default is 1536), mapreduce.map.memory.mb=256 (default is 1GB), mapreduce.reduce.memory.mb=256 (default is 1GB), mapreduce.map.speculative=false (default is true), mapreduce.job.reduce.slowstart.completedmaps=0.8 (default is 0.05) and some more.. When I look at the conf.xml of the job under HDFS directory /tmp/hadoop-yarn/staging//.staging//_conf.xml I see some values are accepted and some are not accepted. I see mapreduce.map.memory.mb, mapreduce.reduce.memory.mb with modified values but yarn.app.mapreduce.am.resource.mb is with default value of 1536. The mapreduce.map.speculative is with default value of true. The mapreduce.job.reduce.slowstart.completedmaps is with default value of 0.05. To forcefully set this new values I am sending these properties in the client by the command hadoop jar
\ -D mapreduce.job.reduce.slowstart.completedmaps=0.80 \ -D mapreduce.map.speculative=false \ There is no good document giving distinction between what is client side property and what is server side property. TIA Susheel Kumar On 9/9/14, java8964 wrote: > The configuration in fact depends on the xml file. Not sure what kind of > cluster configuration variables/values you are looking for. > Remember, the cluster is made of set of computers, and in hadoop, there are > hdfs xml, mapred xml and even yarn xml. > Mapred.xml and yarn.xml are job related. Without concrete job, there is no > detail configuration can be given. > About the HDFS configuration, there are a set of computers in the cluster. > In theory, there is nothing wrong that each computer will have different > configuration settings. Every computer could have different cpu cores, > memory, disk counts, mount names etc. When you ask configuration > variables/values, which one should be returned? > Yong > > Date: Tue, 9 Sep 2014 10:01:14 -0700 > Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't > set it explicitly > From: nidmgg@gmail.com > To: user@hadoop.apache.org > > Susheel actually brought up a good point. > > once the client code connects to the cluster, is there way to get the real > cluster configuration variables/values instead of relying on the .xml files > on client side? > > Demai > > On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay > wrote: > One doubt on building Configuration object. > > > > I have a Hadoop remote client and Hadoop cluster. > > When a client submitted a MR job, the Configuration object is built > > from Hadoop cluster node xml files, basically the resource manager > > node core-site.xml and mapred-site.xml and yarn-site.xml. > > Am I correct? > > > > TIA > > Susheel Kumar > > > > On 9/9/14, Bhooshan Mogal wrote: > >> Hi Demai, > >> > >> conf = new Configuration() > >> > >> will create a new Configuration object and only add the properties from > >> core-default.xml and core-site.xml in the conf object. > >> > >> This is basically a new configuration object, not the same that the >> daemons > >> in the hadoop cluster use. > >> > >> > >> > >> I think what you are trying to ask is if you can get the Configuration > >> object that a daemon in your live cluster (e.g. datanode) is using. I am > >> not sure if the datanode or any other daemon on a hadoop cluster exposes > >> such an API. > >> > >> I would in fact be tempted to get this information from the configuration > >> management daemon instead - in your case cloudera manager. But I am not > >> sure if CM exposes that API either. You could probably find out on the > >> Cloudera mailing list. > >> > >> > >> HTH, > >> Bhooshan > >> > >> > >> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni wrote: > >> > >>> hi, Bhooshan, > >>> > >>> thanks for your kind response. I run the code on one of the data node >>> of > >>> my cluster, with only one hadoop daemon running. I believe my java >>> client > >>> code connect to the cluster correctly as I am able to retrieve > >>> fileStatus, > >>> and list files under a particular hdfs path, and similar things... > >>> However, you are right that the daemon process use the hdfs-site.xml > >>> under > >>> another folder for cloudera : > >>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml. > >>> > >>> about " retrieving the info from a live cluster", I would like to get >>> the > >>> information beyond the configuration files(that is beyond the .xml > >>> files). > >>> Since I am able to use : > >>> conf = new Configuration() > >>> to connect to hdfs and did other operations, shouldn't I be able to > >>> retrieve the configuration variables? > >>> > >>> Thanks > >>> > >>> Demai > >>> > >>> > >>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal >>> > >>> wrote: > >>> > >>>> Hi Demai, > >>>> > >>>> When you read a property from the conf object, it will only have a >>>> value > >>>> if the conf object contains that property. > >>>> > >>>> In your case, you created the conf object as new Configuration() -- >>>> adds > >>>> core-default and core-site.xml. > >>>> > >>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from >>>> specific > >>>> locations. If none of these files have defined dfs.data.dir, then you > >>>> will > >>>> get NULL. This is expected behavior. > >>>> > >>>> What do you mean by retrieving the info from a live cluster? Even for > >>>> processes like datanode, namenode etc, the source of truth for these > >>>> properties is hdfs-site.xml. It is loaded from a specific location when > >>>> you > >>>> start these services. > >>>> > >>>> Question: Where are you running the above code? Is it on a node which > >>>> has > >>>> other hadoop daemons as well? > >>>> > >>>> My guess is that the path you are referring to (/etc/hadoop/conf. > >>>> cloudera.hdfs/core-site.xml) is not the right path where these config > >>>> properties are defined. Since this is a CDH cluster, you would probably > >>>> be > >>>> best served by asking on the CDH mailing list as to where the right >>>> path > >>>> to > >>>> these files is. > >>>> > >>>> > >>>> HTH, > >>>> Bhooshan > >>>> > >>>> > >>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni wrote: > >>>> > >>>>> hi, experts, > >>>>> > >>>>> I am trying to get the local filesystem directory of data node. My > >>>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So > >>>>> the > >>>>> datanode is under file:///dfs/dn. I didn't specify the value in > >>>>> hdfs-site.xml. > >>>>> > >>>>> My code is something like: > >>>>> > >>>>> conf = new Configuration() > >>>>> > >>>>> // test both with and without the following two lines > >>>>> conf.addResource (new > >>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml")); > >>>>> conf.addResource (new > >>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml")); > >>>>> > >>>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL > >>>>> String dnDir = conf.get("dfs.data.dir"); // return NULL > >>>>> > >>>>> It looks like the get only look at the configuration file instead of > >>>>> retrieving the info from the live cluster? > >>>>> > >>>>> Many thanks for your help in advance. > >>>>> > >>>>> Demai > >>>>> > >>>> > >>>> > >>>> > >>>> -- > >>>> Bhooshan > >>>> > >>> > >>> > >> > >> > >> -- > >> Bhooshan > >> > > >