hadoop-hdfs-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sai Prasanna <ansaiprasa...@gmail.com>
Subject Re: App Master issue.
Date Fri, 07 Mar 2014 06:23:04 GMT
Hi MJ, Extremely sorry for a late response...Had some infrastructure issues

I am using Hadoop 2.3.0. Actually when i was trying to solve this AppMaster
issue, i came up with a strange observation. "STICKY SLOT" of app-master to
only the data node at the Master node if i set the following parameters
along with *yarn.resourcemanager.hostname*

*yarn.resourcemanager.address to master:8034*
*yarn.resourcemanager.scheduler.address to master:8030*
*yarn.resourcemanager.resource-tracker.address to master:8025  *across all
the slave nodes.
The default values can be found here...

I came up with this strange observation though...If i dont set this 3
values i had described above, always in the slave nodes if the appmaster is
launced, it tries connecting to the resource manager to the default
and not to the specified one. But with this values set, app-master is
always launced in the master and everything seems fine...So i brought the
datanode in the master down and checked as to what is happening...

Strange, the jobs are not even assigned to any appmaster...but i think
appmaster doesnt have any property of getting sticky...resource manager
looks for a free container and goes ahead launching it...

So now these things needs to be resolved:
1) Why does it work fine if the 3 mentioned values are set in the slave
nodes and app master gets launced only in the master node'
2) If no data node is there in the master, then application doesnt get
assigned at all to any app master.

I have attached my config files for your reference...[Renamed for better
reading n understanding]

Thanks for your response !!

On Thu, Mar 6, 2014 at 7:34 AM, Mingjiang Shi <mshi@gopivotal.com> wrote:

> Sorry, it should be accessing http://<node_manager_ip>:8042/conf to check
> the value of yarn.resourcemanager.
> scheduler.address on the node manager.
> On Thu, Mar 6, 2014 at 9:36 AM, Mingjiang Shi <mshi@gopivotal.com> wrote:
>> Hi Sai,
>> A few questions:
>> 1. which version of hadoop are you using? yarn.resourcemanager.hostname
>> is a new configuration which is not available old versions.
>> 2. Does your yarn-site.xml contains
>> yarn.resourcemanager.scheduler.address? If yes, what's the value?
>> 3. or you could access http://<resource_mgr>:8088/conf to check the
>> value of yarn.resourcemanager.scheduler.address.
>> On Thu, Mar 6, 2014 at 3:29 AM, Sai Prasanna <ansaiprasanna@gmail.com>wrote:
>>> Hi,
>>> I have a five node cluster. One master and 4 slaves. Infact master also
>>> has a data node running. When ever app master is launched in the master
>>> node, simple wordcount program runs fine. But if it is launched in some
>>> slave nodes, the progress of the application gets hung.
>>> The problem is, though i have set the yarn.resourcemanager.hostname to
>>> the ip-address of the master, the slave connects only to the default,
>>> What could be the reason ???
>>> I get the following message in the logs of app.master in web-UI.
>>> *"...Configuration: job.xml:an attempt to override final parameter:
>>> mapreduce.job.end-notification.max.retry.interval;  Ignoring.*
>>> *2014-03-05 20:15:50,597 WARN [main]
>>> org.apache.hadoop.conf.Configuration: job.xml:an attempt to override final
>>> parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
>>> 2014-03-05 20:15:50,603 INFO [main] org.apache.hadoop.yarn.client.RMProxy:
>>> Connecting to ResourceManager at /
>>> <>2014-03-05 20:15:56,632 INFO [main]
>>> org.apache.hadoop.ipc.Client: Retrying connect to server:
>>> <>. Already tried 0
>>> time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10,
>>> sleepTime=1000 MILLISECONDS)"*

View raw message