hadoop-yarn-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Kanter (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (YARN-4697) NM aggregation thread pool is not bound by limits
Date Fri, 19 Feb 2016 22:32:18 GMT

    [ https://issues.apache.org/jira/browse/YARN-4697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155015#comment-15155015

Robert Kanter commented on YARN-4697:

Besides [~wilfreds]'s comments, I have some feedback on the unit test:
- We should use more than 1 thread in the thread pool because 1 of something can sometimes
hide problems.  Something like 3 would be better.
- In case something goes wrong, it would be good to:
-- add a timeout to the test {{@Test(timeout=30000)}}
-- make the threads not block indefinitely.  That can be done by using [{{tryAcquire(long
timeout, TimeUnit unit)}}|https://docs.oracle.com/javase/7/docs/api/java/util/concurrent/Semaphore.html#tryAcquire(long,%20java.util.concurrent.TimeUnit)]
instead of just {{acquire()}}.  If you make the timeout for the threads longer than the timeout
for the test itself, you won't have to worry about any timing problems with the thread exiting
early, while still preventing the threads from possibly hanging forever
- The way you're searching for threads is okay, but it would be better if we could get them
directly from the thread pool.  I see that {{LogAggregationService}} only exposes an {{ExecutorService}}
for the thread pool, but looking at how it's made, I believe it's actually a {{ThreadPoolExecutor}}
underneath.  Can you try casting to {{ThreadPoolExecutor}} and see if that works?  {{ThreadPoolExecutor}}
has methods to check how many threads are running etc.  If that doesn't work, then I'm okay
with the current approach.

> NM aggregation thread pool is not bound by limits
> -------------------------------------------------
>                 Key: YARN-4697
>                 URL: https://issues.apache.org/jira/browse/YARN-4697
>             Project: Hadoop YARN
>          Issue Type: Improvement
>          Components: nodemanager
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>         Attachments: yarn4697.001.patch, yarn4697.002.patch
> In the LogAggregationService.java we create a threadpool to upload logs from the nodemanager
to HDFS if log aggregation is turned on. This is a cached threadpool which based on the javadoc
is an ulimited pool of threads.
> In the case that we have had a problem with log aggregation this could cause a problem
on restart. The number of threads created at that point could be huge and will put a large
load on the NameNode and in worse case could even bring it down due to file descriptor issues.

This message was sent by Atlassian JIRA

View raw message