hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Haibo Chen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4697) NM aggregation thread pool is not bound by limits
Date Wed, 17 Feb 2016 17:49:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15150864#comment-15150864

Haibo Chen commented on YARN-4697:

Hi Naganarasimha G R, 

Thanks very much for your comments. I have addressed the threadPool accessibility issue and
also modified yarn-default.xml to match YarnConfiguration. To answer your other comments:

1. Yes, 50 should  be safe. (The default I set is 100). But maybe sometimes even 50 threads
alone for log aggregation is too much resource dedicated? Some users may also want to use
more than 50 if they have powerful machines and many yarn applications? If this is configurable,
users themselves can decide.

2. The purpose of the semaphore is to block the threads in the thread pool because the main
thread always acquire the semaphore first. Because I set the thread pool size to be 1, once
that single thread tries to acquire the semaphore when it executes either of the two runnable,
it blocks and the other runnable will not be executed if the thread pool can indeed create
only 1 thread. (If another thread is available in the thread pool, there will be another thread
blocking on the semaphore, failing the test). The immediate release after acquire in runnable
is just to safely release the resource. I'll try to add comments in the test code.

> NM aggregation thread pool is not bound by limits
> -------------------------------------------------
>                 Key: YARN-4697
>                 URL: https://issues.apache.org/jira/browse/YARN-4697
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>         Attachments: yarn4697.001.patch
> In the LogAggregationService.java we create a threadpool to upload logs from the nodemanager
to HDFS if log aggregation is turned on. This is a cached threadpool which based on the javadoc
is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause a problem
on restart. The number of threads created at that point could be huge and will put a large
load on the NameNode and in worse case could even bring it down due to file descriptor issues.

This message was sent by Atlassian JIRA

View raw message