hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Demai Ni <nid...@gmail.com>
Subject Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly
Date Tue, 09 Sep 2014 17:01:14 GMT
Susheel actually brought up a good point.

once the client code connects to the cluster, is there way to get the real
cluster configuration variables/values instead of relying on the .xml files
on client side?

Demai

On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay <skgadalay@gmail.com>
wrote:

> One doubt on building Configuration object.
>
> I have a Hadoop remote client and Hadoop cluster.
> When a client submitted a MR job, the Configuration object is built
> from Hadoop cluster node xml files, basically the resource manager
> node core-site.xml and mapred-site.xml and yarn-site.xml.
> Am I correct?
>
> TIA
> Susheel Kumar
>
> On 9/9/14, Bhooshan Mogal <bhooshan.mogal@gmail.com> wrote:
> > Hi Demai,
> >
> > conf = new Configuration()
> >
> > will create a new Configuration object and only add the properties from
> > core-default.xml and core-site.xml in the conf object.
> >
> > This is basically a new configuration object, not the same that the
> daemons
> > in the hadoop cluster use.
> >
> >
> >
> > I think what you are trying to ask is if you can get the Configuration
> > object that a daemon in your live cluster (e.g. datanode) is using. I am
> > not sure if the datanode or any other daemon on a hadoop cluster exposes
> > such an API.
> >
> > I would in fact be tempted to get this information from the configuration
> > management daemon instead - in your case cloudera manager. But I am not
> > sure if CM exposes that API either. You could probably find out on the
> > Cloudera mailing list.
> >
> >
> > HTH,
> > Bhooshan
> >
> >
> > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <nidmgg@gmail.com> wrote:
> >
> >> hi, Bhooshan,
> >>
> >> thanks for your kind response.  I run the code on one of the data node
> of
> >> my cluster, with only one hadoop daemon running. I believe my java
> client
> >> code connect to the cluster correctly as I am able to retrieve
> >> fileStatus,
> >> and list files under a particular hdfs path, and similar things...
> >> However, you are right that the daemon process use the hdfs-site.xml
> >> under
> >> another folder for cloudera :
> >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml.
> >>
> >> about " retrieving the info from a live cluster", I would like to get
> the
> >> information beyond the configuration files(that is beyond the .xml
> >> files).
> >> Since I am able to use :
> >> conf = new Configuration()
> >> to connect to hdfs and did other operations, shouldn't I be able to
> >> retrieve the configuration variables?
> >>
> >> Thanks
> >>
> >> Demai
> >>
> >>
> >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <
> bhooshan.mogal@gmail.com>
> >> wrote:
> >>
> >>> Hi Demai,
> >>>
> >>> When you read a property from the conf object, it will only have a
> value
> >>> if the conf object contains that property.
> >>>
> >>> In your case, you created the conf object as new Configuration() --
> adds
> >>> core-default and core-site.xml.
> >>>
> >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from
> specific
> >>> locations. If none of these files have defined dfs.data.dir, then you
> >>> will
> >>> get NULL. This is expected behavior.
> >>>
> >>> What do you mean by retrieving the info from a live cluster? Even for
> >>> processes like datanode, namenode etc, the source of truth for these
> >>> properties is hdfs-site.xml. It is loaded from a specific location when
> >>> you
> >>> start these services.
> >>>
> >>> Question: Where are you running the above code? Is it on a node which
> >>> has
> >>> other hadoop daemons as well?
> >>>
> >>> My guess is that the path you are referring to (/etc/hadoop/conf.
> >>> cloudera.hdfs/core-site.xml) is not the right path where these config
> >>> properties are defined. Since this is a CDH cluster, you would probably
> >>> be
> >>> best served by asking on the CDH mailing list as to where the right
> path
> >>> to
> >>> these files is.
> >>>
> >>>
> >>> HTH,
> >>> Bhooshan
> >>>
> >>>
> >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <nidmgg@gmail.com> wrote:
> >>>
> >>>> hi, experts,
> >>>>
> >>>> I am trying to get the local filesystem directory of data node. My
> >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration.
So
> >>>> the
> >>>> datanode is under file:///dfs/dn. I didn't specify the value in
> >>>> hdfs-site.xml.
> >>>>
> >>>> My code is something like:
> >>>>
> >>>> conf = new Configuration()
> >>>>
> >>>> // test both with and without the following two lines
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml"));
> >>>> conf.addResource (new
> >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml"));
> >>>>
> >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL
> >>>> String dnDir = conf.get("dfs.data.dir");  // return NULL
> >>>>
> >>>> It looks like the get only look at the configuration file instead of
> >>>> retrieving the info from the live cluster?
> >>>>
> >>>> Many thanks for your help in advance.
> >>>>
> >>>> Demai
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Bhooshan
> >>>
> >>
> >>
> >
> >
> > --
> > Bhooshan
> >
>

Mime
View raw message