hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Casey K <rocksu...@gmail.com>
Subject Re: YARN App Master logs and other qns
Date Thu, 03 Apr 2014 16:59:58 GMT
I was able to fix address item (2) below.

Looking through the logs, I noticed that the node manager initiated
shutdown but was killed before it could finish. So I increased the value
for YARN_STOP_TIMEOUT from default 5 secs to 10 secs and in some cases 30
secs. Is it normal to have longer than 10 sec timeouts?

On Mon, Mar 31, 2014 at 2:32 PM, Casey K <rocksuser@gmail.com> wrote:

> Hello,
> I am fairly new to the Hadoop framework. So I appreciate your patience in
> case my email has not entirely correct or the terminology is wrong. I have
> a working installation. However, I am facing a few issues:
> 1) I have run PI example a number of times. The number of slave nodes used
> is 4. Most times the runtime is about 31 secs. Other times, i varies widely
> and goes up to 650 secs. What could be causing this? This is a dedicated
> cluster with no other workloads
> 2) "nodemanager did not stop gracefully after 5 seconds: killing with kill
> -9" Every time during shutdown, the nodemanager is forcibly killed because
> it doesnt respond in 5 seconds. I dug through the logs and dont find any
> thing off. One thing I found is noted in (3).
> 3) I see errors as follows: "2014-03-31 12:27:26,975 ERROR [RMCommunicator
> Allocator] org.apache.hadoop.mapreduce.v2.app.rm.RMContainerAllocator:
> Container complete event for unknown container id
> container_1396286812424_0001_01_000042" My searches indicate this is
> because the connection to the appmaster is lost. I cant seem to find where
> the appmaster logs are
> 4) If Proxy server needed? I did not set the " yarn.web-proxy.address" and
> so it never starts. My understand is that it starts as a part of RM in this
> case.
> 5) RDMA based shuffle - Mellanox seems to have contributed code for RDMA
> shuffle instead of HTTP. Is this part of YARN? If yes, how do I enable it?
> Is UDA required for RDMA Shuffle.
> 6) If I want to provide support for a new file system, is there a tutorial
> on what all needs to be implemented? I found that
> org.apache.hadoop.fs.FileSystem is the class to extend. However, a sample
> code or documentation would help.
> Appreciate the help.
> Regards,
> Casey

View raw message