Return-Path: X-Original-To: apmail-hadoop-common-user-archive@www.apache.org Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 707541180D for ; Mon, 8 Sep 2014 23:59:22 +0000 (UTC) Received: (qmail 42704 invoked by uid 500); 8 Sep 2014 23:59:17 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 42591 invoked by uid 500); 8 Sep 2014 23:59:17 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 42581 invoked by uid 99); 8 Sep 2014 23:59:17 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2014 23:59:17 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: domain of bhooshan.mogal@gmail.com designates 209.85.223.170 as permitted sender) Received: from [209.85.223.170] (HELO mail-ie0-f170.google.com) (209.85.223.170) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2014 23:59:11 +0000 Received: by mail-ie0-f170.google.com with SMTP id tp5so4840514ieb.29 for ; Mon, 08 Sep 2014 16:58:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=OTfiX4OGK3fsHZ1T2rnD2sBO9iGmR6T10uLewXfDBxc=; b=F7QfebF9txYA0LxqLkGm29eH5GS+t66O4XBN5Cxjrkw86HcLQ5XUnk62F/0TMe4Yte GoBD26VxQl7f2WhyM6o+lufHSYWhHOZBBw1AmaBYOCtvWPV5VVqkO4aa14xQOkUG62ky 35gYkfC2s2UIFzvZQvdhYg+THHhXQEmk4N+UIFhE7xfVRAx4HlNFqBwwEf8PwkjBmnRx rhIywbqu1uj18WgxVHBnmMKuQrmCyu77mKunCLkee76iTD9zjxi+oj2f7VTEUVC+79YM g1KAqA7Dw8wsSCRshX2RoFygsFnS72il4HzEHEs9XmMyzj3579aA7fp34nEkX+zDFRBp /Drg== MIME-Version: 1.0 X-Received: by 10.42.172.195 with SMTP id o3mr5917528icz.76.1410220730622; Mon, 08 Sep 2014 16:58:50 -0700 (PDT) Received: by 10.50.62.11 with HTTP; Mon, 8 Sep 2014 16:58:50 -0700 (PDT) In-Reply-To: References: Date: Mon, 8 Sep 2014 16:58:50 -0700 Message-ID: Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly From: Bhooshan Mogal To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=90e6ba6e830eb4290b0502969907 X-Virus-Checked: Checked by ClamAV on apache.org --90e6ba6e830eb4290b0502969907 Content-Type: text/plain; charset=UTF-8 Hi Demai, conf = new Configuration() will create a new Configuration object and only add the properties from core-default.xml and core-site.xml in the conf object. This is basically a new configuration object, not the same that the daemons in the hadoop cluster use. I think what you are trying to ask is if you can get the Configuration object that a daemon in your live cluster (e.g. datanode) is using. I am not sure if the datanode or any other daemon on a hadoop cluster exposes such an API. I would in fact be tempted to get this information from the configuration management daemon instead - in your case cloudera manager. But I am not sure if CM exposes that API either. You could probably find out on the Cloudera mailing list. HTH, Bhooshan On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni wrote: > hi, Bhooshan, > > thanks for your kind response. I run the code on one of the data node of > my cluster, with only one hadoop daemon running. I believe my java client > code connect to the cluster correctly as I am able to retrieve fileStatus, > and list files under a particular hdfs path, and similar things... > However, you are right that the daemon process use the hdfs-site.xml under > another folder for cloudera : > /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml. > > about " retrieving the info from a live cluster", I would like to get the > information beyond the configuration files(that is beyond the .xml files). > Since I am able to use : > conf = new Configuration() > to connect to hdfs and did other operations, shouldn't I be able to > retrieve the configuration variables? > > Thanks > > Demai > > > On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal > wrote: > >> Hi Demai, >> >> When you read a property from the conf object, it will only have a value >> if the conf object contains that property. >> >> In your case, you created the conf object as new Configuration() -- adds >> core-default and core-site.xml. >> >> Then you added site.xmls (hdfs-site.xml and core-site.xml) from specific >> locations. If none of these files have defined dfs.data.dir, then you will >> get NULL. This is expected behavior. >> >> What do you mean by retrieving the info from a live cluster? Even for >> processes like datanode, namenode etc, the source of truth for these >> properties is hdfs-site.xml. It is loaded from a specific location when you >> start these services. >> >> Question: Where are you running the above code? Is it on a node which has >> other hadoop daemons as well? >> >> My guess is that the path you are referring to (/etc/hadoop/conf. >> cloudera.hdfs/core-site.xml) is not the right path where these config >> properties are defined. Since this is a CDH cluster, you would probably be >> best served by asking on the CDH mailing list as to where the right path to >> these files is. >> >> >> HTH, >> Bhooshan >> >> >> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni wrote: >> >>> hi, experts, >>> >>> I am trying to get the local filesystem directory of data node. My >>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So the >>> datanode is under file:///dfs/dn. I didn't specify the value in >>> hdfs-site.xml. >>> >>> My code is something like: >>> >>> conf = new Configuration() >>> >>> // test both with and without the following two lines >>> conf.addResource (new >>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml")); >>> conf.addResource (new >>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml")); >>> >>> // I also tried get("dfs.datanode.data.dir"), which also return NULL >>> String dnDir = conf.get("dfs.data.dir"); // return NULL >>> >>> It looks like the get only look at the configuration file instead of >>> retrieving the info from the live cluster? >>> >>> Many thanks for your help in advance. >>> >>> Demai >>> >> >> >> >> -- >> Bhooshan >> > > -- Bhooshan --90e6ba6e830eb4290b0502969907 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Hi Demai,

conf =3D new Configuration()= =C2=A0

will create a new Configuration object and = only add the properties from core-default.xml and core-site.xml in the conf= object.=C2=A0

This is basically a new configurati= on object, not the same that the daemons in the hadoop cluster use.



I think what you are trying= to ask is if you can get the Configuration object that a daemon in your li= ve cluster (e.g. datanode) is using. I am not sure if the datanode or any o= ther daemon on a hadoop cluster exposes such an API.

I would in fact be tempted to get this information from the configuratio= n management daemon instead - in your case cloudera manager. But I am not s= ure if CM exposes that API either. You could probably find out on the Cloud= era mailing list.


HTH,
Bh= ooshan


On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <nidmgg@gmail.com> wrote:
hi, Bhooshan,

thanks for your kind response.=C2=A0 I run the code = on one of the data node of my cluster, with only one hadoop daemon running.= I believe my java client code connect to the cluster correctly as I am=20 able to retrieve fileStatus, and list files under a particular hdfs=20 path, and similar things<= /span>... However,= you are right that the daemon process use the hdfs-site.xml under another = folder for cloudera :
/var/run/cloudera-scm-agent/process/90-hdf= s-DATANODE/hdfs-site.xml.

about " retrievin= g the info from a live cluster", I would like to get the information b= eyond the configuration files(that is beyond the .xml files). Since I am ab= le to use :
conf =3D new Configuration()
to connect to hdfs and did other operation= s, shouldn't I be able to retrieve the configuration variables?

=
Thanks

Dem= ai


On Mon, Sep 8, 2014 at 2:40 PM, Bh= ooshan Mogal <bhooshan.mogal@gmail.com> wrote:
Hi Demai,

W= hen you read a property from the conf object, it will only have a value if = the conf object contains that property.

In your ca= se, you created the conf object as new Configuration() -- adds core-default= and core-site.xml.

Then you added site.xmls (hdfs= -site.xml and core-site.xml) from specific locations. If none of these file= s have defined dfs.data.dir, then you will get NULL. This is expected behav= ior.=C2=A0

What do you mean by retrieving the info= from a live cluster? Even for processes like datanode, namenode etc, the s= ource of truth for these properties is hdfs-site.xml. It is loaded from a s= pecific location when you start these services.

Qu= estion: Where are you running the above code? Is it on a node which has oth= er hadoop daemons as well?

My guess is that the pa= th you are referring to (/etc/hadoop/conf.cloudera.hdfs/core-site.xm= l) is not the right path where these config properties are defined. = Since this is a CDH cluster, you would probably be best served by asking on= the CDH mailing list as to where the right path to these files is.


HTH,
Bhooshan


On Mon, Se= p 8, 2014 at 11:47 AM, Demai Ni <nidmgg@gmail.com> wrote:
=
hi, experts,

I am trying to get the local filesystem d= irectory of data node. My cluster is using CDH5.x (hadoop 2.3) and the defa= ult configuration. So the datanode is under file:///dfs/dn. I didn't sp= ecify the value in hdfs-site.xml.

My code is something like: =

conf =3D new Configuration()<= br>
= // test both with and without the following two lines
conf.addResource (new Path(&qu= ot;/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
conf.addResour= ce (new Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
//= I also tried get("dfs.datanode.data.dir"), which also return NUL= L
String = dnDir =3D conf.get("dfs.data.dir");=C2=A0 // return NULL
It looks like the get only look at the configuration file inste= ad of retrieving the info from the live cluster?

Many thanks = for your help in advance.

Demai



<= font color=3D"#888888">--
Bhooshan




--
=
Bhooshan
--90e6ba6e830eb4290b0502969907--