hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4662) JobHistoryFilesManager thread pool never expands
Date Thu, 20 Sep 2012 16:25:08 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459713#comment-13459713
] 

Kihwal Lee commented on MAPREDUCE-4662:
---------------------------------------

bq. One solution is to specify maximum number of queued requests for LinkedBlockingQueue.

That could be it, but this solution needs more changes. When the queue is full and the max
number of threads are running, new task will be rejected. We could apply CallerRunsPolicy,
but the whole point of having ThreadPoolExecutor is to avoid blocking of JobTracker for doing
job completion.

I think the main requirements here are:
* Absorb bursty job completions - queueing with sufficient capacity or fast dispatching with
a large thread pool.
* Avoid limiting job throughput - enough number of worker threads
* Minimize consumption of extra resource - limit the number of worker threads
* Don't drop anything.

To satisfy the first and second requirements, one can think of the following two approaches.

* Have a bounded queue and a sufficiently large thread pool. Since we cannot drop any job
completion, we want CallerRunsPolicy for rejected ones. 

* Alternatively, use an unbounded queue and a reasonable number of core threads. No work will
be rejected in this case.

Between the two, the second one has an advantage, considering the third requirement and its
simplicity. The question is, what is the reasonable number of core threads to avoid lagging
behind forever? Base on our experience, 3 to 5 seems reasonable.  The moveToDone() throughput
varies a lot, but it topped at around 0.8/second in one of busiest clusters I've seen. If
the job completion rate goes over this rate for a long time, the queue will grow and history
won't show up for most of newer jobs.

Here are the two approaches in code:

* The queue is bounded but will absorb bursts of about 100. If the core thread cannot keep
up, up to 10 more threads will be created to help the core thread drain the queue.  If the
queue cannot be drained fast enough, the caller will directly execute the work. This will
block the job tracker, since JobTracker#finalizeJob() is a synchronized method. So the thread
pool size and the queue size must be sufficiently large.

{noformat}
 executor = new ThreadPoolExecutor(1, 10, 1, TimeUnit.HOURS, 
     new LinkedBlockingQueue<Runnable>(100), ThreadPoolExecutor.CallerRunsPolicy);
{noformat}


* The following will eventually start up 5 threads and keep them running. Non-blocking and
least amount of changes.

{noformat}
 executor = new ThreadPoolExecutor(5, 5, 1, TimeUnit.HOURS, new LinkedBlockingQueue<Runnable>());
{noformat}

What do you think is better? Or can you think of any better approaches?
                
> JobHistoryFilesManager thread pool never expands
> ------------------------------------------------
>
>                 Key: MAPREDUCE-4662
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4662
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver
>    Affects Versions: 1.0.2
>            Reporter: Thomas Graves
>
> The job history file manager creates a threadpool with core size 1 thread, max pool size
3.   It never goes beyond 1 thread though because its using a LinkedBlockingQueue which doesn't
have a max size. 
>     void start() {
>       executor = new ThreadPoolExecutor(1, 3, 1,
>           TimeUnit.HOURS, new LinkedBlockingQueue<Runnable>());
>     }
> According to the ThreadPoolExecutor java doc page it only increases the number of threads
when the queue is full. Since the queue we are using has no max size it never fills up and
we never get more then 1 thread. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message