hadoop-common-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Staffan Friberg (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HADOOP-12528) Avoid spinning in CallQueueManager.take()
Date Mon, 02 Nov 2015 18:44:27 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-12528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14985726#comment-14985726

Staffan Friberg commented on HADOOP-12528:

The number of entries into Thread Park during 10minutes on a NN with 60 IPC threads goes down
from 36000 to around 900. Seems lke one group of IPC threads wake up every 20s, and the other
every 2 minutes, I was doing a large file delete so not sure if that would increase the heartbeating/communication
anything other than the amount of data transfered.

So with the 1s poll, you will have 60 threads waking up each second and then going back to
sleep again, for large clusters with more IPC threads this would go up even further.
How many IPC threads will a very large cluster be configured with?

The other cost is that each time you enter a small allocation of the synchronization object
will occur.

What JVM metrics are you collecting and how?

> Avoid spinning in CallQueueManager.take()
> -----------------------------------------
>                 Key: HADOOP-12528
>                 URL: https://issues.apache.org/jira/browse/HADOOP-12528
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: performance
>    Affects Versions: 2.7.1
>            Reporter: Staffan Friberg
>            Assignee: Staffan Friberg
>            Priority: Minor
>         Attachments: HADOOP-12528.001.patch, HADOOP-12528.002.patch
> When IPC threads (Server$Handler) does take() to get the next Call, the CallManager does
a poll instead of take() on the internal queue.
> This causes threads to wake up and unnecessarily waste some CPU and do extra allocation
as part of the internal await/signal mechanism each time the thread redoes poll().
> This patch uses take() on the queue instead of poll() which will keep thread in the await
state until work is available. Since threads will be blocked on the queue indefinitely the
swapping of queues requires a bit of extra work to make sure threads wake up and does take
on the new queue.
> Updated the test TestCallQueueManager.testSwapUnderContention() to ensure that no threads
get stuck on the old queue as part of swapping.

This message was sent by Atlassian JIRA

View raw message