Return-Path: X-Original-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Delivered-To: apmail-hadoop-mapreduce-user-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 53BB511BEA for ; Tue, 9 Sep 2014 17:01:49 +0000 (UTC) Received: (qmail 11261 invoked by uid 500); 9 Sep 2014 17:01:43 -0000 Delivered-To: apmail-hadoop-mapreduce-user-archive@hadoop.apache.org Received: (qmail 11140 invoked by uid 500); 9 Sep 2014 17:01:43 -0000 Mailing-List: contact user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@hadoop.apache.org Delivered-To: mailing list user@hadoop.apache.org Received: (qmail 11130 invoked by uid 99); 9 Sep 2014 17:01:43 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Sep 2014 17:01:43 +0000 X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of nidmgg@gmail.com designates 209.85.220.182 as permitted sender) Received: from [209.85.220.182] (HELO mail-vc0-f182.google.com) (209.85.220.182) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Sep 2014 17:01:15 +0000 Received: by mail-vc0-f182.google.com with SMTP id le20so3282827vcb.13 for ; Tue, 09 Sep 2014 10:01:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=yRn3RUfXz0Hx2o+law8GtSGYUFogLRWOi2G7rylcJ5c=; b=mH3ct4gQ5SdKEWDiZeYEoMYqNToYvOePHE8k7RVtWhv6P1Dx48Nnp4r2xFZ99tWcH8 Nqrkn7EyoW9DKIg+ME6HTeT2axaBS52OfjHCi+Ml4hbN3NnxiSTqHV5nPON+4Uiz1tZn HzI4e75eVQoi6bWaBaRq7tq2HdVpwS7idqHIXiYCcD6e7jV3yxMRYkswSK8pkusXOrTE JXcMdQBDAeTz0NRcDsZlXdWr3xHDdymJdDloSBZ8DC0IICtM1SX9snbD0e1lrW6tKR9S MbSoZSvMlGw46Jc0YanUlLQ6Mu0K1jKqFRpOn9CFtrcRZdcGHbGgA3SroLXYXkBjXVE9 DGGA== MIME-Version: 1.0 X-Received: by 10.52.83.227 with SMTP id t3mr25561690vdy.20.1410282074127; Tue, 09 Sep 2014 10:01:14 -0700 (PDT) Received: by 10.221.51.202 with HTTP; Tue, 9 Sep 2014 10:01:14 -0700 (PDT) In-Reply-To: References: Date: Tue, 9 Sep 2014 10:01:14 -0700 Message-ID: Subject: Re: conf.get("dfs.data.dir") return null when hdfs-site.xml doesn't set it explicitly From: Demai Ni To: "user@hadoop.apache.org" Content-Type: multipart/alternative; boundary=001a11368e800fc9970502a4e265 X-Virus-Checked: Checked by ClamAV on apache.org --001a11368e800fc9970502a4e265 Content-Type: text/plain; charset=UTF-8 Susheel actually brought up a good point. once the client code connects to the cluster, is there way to get the real cluster configuration variables/values instead of relying on the .xml files on client side? Demai On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Gadalay wrote: > One doubt on building Configuration object. > > I have a Hadoop remote client and Hadoop cluster. > When a client submitted a MR job, the Configuration object is built > from Hadoop cluster node xml files, basically the resource manager > node core-site.xml and mapred-site.xml and yarn-site.xml. > Am I correct? > > TIA > Susheel Kumar > > On 9/9/14, Bhooshan Mogal wrote: > > Hi Demai, > > > > conf = new Configuration() > > > > will create a new Configuration object and only add the properties from > > core-default.xml and core-site.xml in the conf object. > > > > This is basically a new configuration object, not the same that the > daemons > > in the hadoop cluster use. > > > > > > > > I think what you are trying to ask is if you can get the Configuration > > object that a daemon in your live cluster (e.g. datanode) is using. I am > > not sure if the datanode or any other daemon on a hadoop cluster exposes > > such an API. > > > > I would in fact be tempted to get this information from the configuration > > management daemon instead - in your case cloudera manager. But I am not > > sure if CM exposes that API either. You could probably find out on the > > Cloudera mailing list. > > > > > > HTH, > > Bhooshan > > > > > > On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni wrote: > > > >> hi, Bhooshan, > >> > >> thanks for your kind response. I run the code on one of the data node > of > >> my cluster, with only one hadoop daemon running. I believe my java > client > >> code connect to the cluster correctly as I am able to retrieve > >> fileStatus, > >> and list files under a particular hdfs path, and similar things... > >> However, you are right that the daemon process use the hdfs-site.xml > >> under > >> another folder for cloudera : > >> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml. > >> > >> about " retrieving the info from a live cluster", I would like to get > the > >> information beyond the configuration files(that is beyond the .xml > >> files). > >> Since I am able to use : > >> conf = new Configuration() > >> to connect to hdfs and did other operations, shouldn't I be able to > >> retrieve the configuration variables? > >> > >> Thanks > >> > >> Demai > >> > >> > >> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal < > bhooshan.mogal@gmail.com> > >> wrote: > >> > >>> Hi Demai, > >>> > >>> When you read a property from the conf object, it will only have a > value > >>> if the conf object contains that property. > >>> > >>> In your case, you created the conf object as new Configuration() -- > adds > >>> core-default and core-site.xml. > >>> > >>> Then you added site.xmls (hdfs-site.xml and core-site.xml) from > specific > >>> locations. If none of these files have defined dfs.data.dir, then you > >>> will > >>> get NULL. This is expected behavior. > >>> > >>> What do you mean by retrieving the info from a live cluster? Even for > >>> processes like datanode, namenode etc, the source of truth for these > >>> properties is hdfs-site.xml. It is loaded from a specific location when > >>> you > >>> start these services. > >>> > >>> Question: Where are you running the above code? Is it on a node which > >>> has > >>> other hadoop daemons as well? > >>> > >>> My guess is that the path you are referring to (/etc/hadoop/conf. > >>> cloudera.hdfs/core-site.xml) is not the right path where these config > >>> properties are defined. Since this is a CDH cluster, you would probably > >>> be > >>> best served by asking on the CDH mailing list as to where the right > path > >>> to > >>> these files is. > >>> > >>> > >>> HTH, > >>> Bhooshan > >>> > >>> > >>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni wrote: > >>> > >>>> hi, experts, > >>>> > >>>> I am trying to get the local filesystem directory of data node. My > >>>> cluster is using CDH5.x (hadoop 2.3) and the default configuration. So > >>>> the > >>>> datanode is under file:///dfs/dn. I didn't specify the value in > >>>> hdfs-site.xml. > >>>> > >>>> My code is something like: > >>>> > >>>> conf = new Configuration() > >>>> > >>>> // test both with and without the following two lines > >>>> conf.addResource (new > >>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml")); > >>>> conf.addResource (new > >>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml")); > >>>> > >>>> // I also tried get("dfs.datanode.data.dir"), which also return NULL > >>>> String dnDir = conf.get("dfs.data.dir"); // return NULL > >>>> > >>>> It looks like the get only look at the configuration file instead of > >>>> retrieving the info from the live cluster? > >>>> > >>>> Many thanks for your help in advance. > >>>> > >>>> Demai > >>>> > >>> > >>> > >>> > >>> -- > >>> Bhooshan > >>> > >> > >> > > > > > > -- > > Bhooshan > > > --001a11368e800fc9970502a4e265 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Susheel actually brought up a good point.

once the client code connects to the cluster, is there way to get the re= al cluster configuration variables/values instead of relying on the .xml fi= les on client side?

Demai

<= div class=3D"gmail_quote">On Mon, Sep 8, 2014 at 10:12 PM, Susheel Kumar Ga= dalay <skgadalay@gmail.com> wrote:
One doubt on building Configuration object.

I have a Hadoop remote client and Hadoop cluster.
When a client submitted a MR job, the Configuration object is built
from Hadoop cluster node xml files, basically the resource manager
node core-site.xml and mapred-site.xml and yarn-site.xml.
Am I correct?

TIA
Susheel Kumar

On 9/9/14, Bhooshan Mogal <b= hooshan.mogal@gmail.com> wrote:
> Hi Demai,
>
> conf =3D new Configuration()
>
> will create a new Configuration object and only add the properties fro= m
> core-default.xml and core-site.xml in the conf object.
>
> This is basically a new configuration object, not the same that the da= emons
> in the hadoop cluster use.
>
>
>
> I think what you are trying to ask is if you can get the Configuration=
> object that a daemon in your live cluster (e.g. datanode) is using. I = am
> not sure if the datanode or any other daemon on a hadoop cluster expos= es
> such an API.
>
> I would in fact be tempted to get this information from the configurat= ion
> management daemon instead - in your case cloudera manager. But I am no= t
> sure if CM exposes that API either. You could probably find out on the=
> Cloudera mailing list.
>
>
> HTH,
> Bhooshan
>
>
> On Mon, Sep 8, 2014 at 3:52 PM, Demai Ni <nidmgg@gmail.com> wrote:
>
>> hi, Bhooshan,
>>
>> thanks for your kind response.=C2=A0 I run the code on one of the = data node of
>> my cluster, with only one hadoop daemon running. I believe my java= client
>> code connect to the cluster correctly as I am able to retrieve
>> fileStatus,
>> and list files under a particular hdfs path, and similar things...=
>> However, you are right that the daemon process use the hdfs-site.x= ml
>> under
>> another folder for cloudera :
>> /var/run/cloudera-scm-agent/process/90-hdfs-DATANODE/hdfs-site.xml= .
>>
>> about " retrieving the info from a live cluster", I woul= d like to get the
>> information beyond the configuration files(that is beyond the .xml=
>> files).
>> Since I am able to use :
>> conf =3D new Configuration()
>> to connect to hdfs and did other operations, shouldn't I be ab= le to
>> retrieve the configuration variables?
>>
>> Thanks
>>
>> Demai
>>
>>
>> On Mon, Sep 8, 2014 at 2:40 PM, Bhooshan Mogal <bhooshan.mogal@gmail.com>
>> wrote:
>>
>>> Hi Demai,
>>>
>>> When you read a property from the conf object, it will only ha= ve a value
>>> if the conf object contains that property.
>>>
>>> In your case, you created the conf object as new Configuration= () -- adds
>>> core-default and core-site.xml.
>>>
>>> Then you added site.xmls (hdfs-site.xml and core-site.xml) fro= m specific
>>> locations. If none of these files have defined dfs.data.dir, t= hen you
>>> will
>>> get NULL. This is expected behavior.
>>>
>>> What do you mean by retrieving the info from a live cluster? E= ven for
>>> processes like datanode, namenode etc, the source of truth for= these
>>> properties is hdfs-site.xml. It is loaded from a specific loca= tion when
>>> you
>>> start these services.
>>>
>>> Question: Where are you running the above code? Is it on a nod= e which
>>> has
>>> other hadoop daemons as well?
>>>
>>> My guess is that the path you are referring to (/etc/hadoop/co= nf.
>>> cloudera.hdfs/core-site.xml) is not the right path where these= config
>>> properties are defined. Since this is a CDH cluster, you would= probably
>>> be
>>> best served by asking on the CDH mailing list as to where the = right path
>>> to
>>> these files is.
>>>
>>>
>>> HTH,
>>> Bhooshan
>>>
>>>
>>> On Mon, Sep 8, 2014 at 11:47 AM, Demai Ni <nidmgg@gmail.com> wrote:
>>>
>>>> hi, experts,
>>>>
>>>> I am trying to get the local filesystem directory of data = node. My
>>>> cluster is using CDH5.x (hadoop 2.3) and the default confi= guration. So
>>>> the
>>>> datanode is under file:///dfs/dn. I didn't specify the= value in
>>>> hdfs-site.xml.
>>>>
>>>> My code is something like:
>>>>
>>>> conf =3D new Configuration()
>>>>
>>>> // test both with and without the following two lines
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/hdfs-site.xml&qu= ot;));
>>>> conf.addResource (new
>>>> Path("/etc/hadoop/conf.cloudera.hdfs/core-site.xml&qu= ot;));
>>>>
>>>> // I also tried get("dfs.datanode.data.dir"), wh= ich also return NULL
>>>> String dnDir =3D conf.get("dfs.data.dir");=C2=A0= // return NULL
>>>>
>>>> It looks like the get only look at the configuration file = instead of
>>>> retrieving the info from the live cluster?
>>>>
>>>> Many thanks for your help in advance.
>>>>
>>>> Demai
>>>>
>>>
>>>
>>>
>>> --
>>> Bhooshan
>>>
>>
>>
>
>
> --
> Bhooshan
>

--001a11368e800fc9970502a4e265--