hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Naganarasimha G R (Naga)" <garlanaganarasi...@huawei.com>
Subject RE: node remains unused after reboot
Date Wed, 23 Sep 2015 04:02:05 GMT
Hi Dmitry,
Seems to be an interesting case, would like some more clarifications in this regard :
1. How many NM's ? Is it a hetergenous cluster or all the nodes have same resource capacity
? by 3000 cores if same config then expecting around 100 nodes, am i correct ?
2. How many applications are running and how many have got finished (basically available in
RM) ? By 35000 you mean finished and running applications ?
3. Weather after some time, tasks are getting assigned ? Also is it only this host not getting
assigned or no other host also gets any containers assigned ?

I suspect this issue might be similar to YARN-3990, hence the above questions. Further you
can check the RM logs and inform weather you see some similar logs as below

2015-07-29 19:39:03,416 | INFO  | AsyncDispatcher event handler | Size of event-queue is 14000
| AsyncDispatcher.java:235
2015-07-29 19:39:03,417 | INFO  | AsyncDispatcher event handler | Size of event-queue is 15000
| AsyncDispatcher.java:235

Regards,
+ Naga


________________________________
From: Dmitry Sivachenko [trtrmitya@gmail.com]
Sent: Wednesday, September 23, 2015 03:57
To: user@hadoop.apache.org
Subject: node remains unused after reboot

Hello!

I am using hadoop-2.7.1. I have a large map job running (total cores available on the cluster
about 3000, total tasks 35000).
In the middle of this process one server reboots.

After reboot, nodemanager starts successfully end registers with resource manager:
2015-09-23 01:06:24,656 INFO  [main] nodemanager.NodeStatusUpdaterImpl (NodeStatusUpdaterImpl.java:registerWithRM(311))
- Notifying ContainerManager to unblock new container-requests

In YARN web-interface I see this host as active, but VCores used remains zero (see screenshot).
But the map job mentioned is still running and have about 12000 pending tasks.

Why this host does not receive tasks to run?

PS: I recently upgraded from 2.4.1 and I did not notice such a problem with 2.4.1: new tasks
were spawning immediately after reboot.

Thanks!




[cid:D5DB63EB-D60D-4301-8A5A-4C8FFE970F71@yandex.ru]

Mime
View raw message