hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Elliott Clark (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-15146) Don't block on Reader threads queueing to a scheduler queue
Date Wed, 22 Jun 2016 06:23:57 GMT

    [ https://issues.apache.org/jira/browse/HBASE-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15343790#comment-15343790
] 

Elliott Clark edited comment on HBASE-15146 at 6/22/16 6:23 AM:
----------------------------------------------------------------

bq.In general, gradually reducing performance is rather preferable in heavy load.
We've found the exact opposite many many times. Pushing back on the client is a well know
and understood load shedding mechanism. That allows the server to take what it can handle
and no more.

By contrast every time the server promises to do work that it can't handle things get worse.
GC gets worse, queue call times get worse, and it becomes a cycle. That continues until a
regionserver is in-operable. Removing threads that can call select leads to multiple seconds
where no tcp acks are sent. On loaded servers we saw all reader threads completely stop any
network selects at all.

bq.Selector.select immediately causes a context switch when an event occurs, 

Yes it does, and you want to get the reader threads back to the calling select as fast as
possible. That's the most basic tenant of an event loop. What was happening was that the threads
would stop for multiple seconds because the queues were full. That meant the event loop is
stopped.

bq.and this patch might make worse performance in such subtle heavy congestion.
The opposite has held true under load.


was (Author: eclark):
bq.In general, gradually reducing performance is rather preferable in heavy load.
We've found the exact opposite many many times. Pushing back on the client is a well know
and understood load shedding mechanism. That allows the server to take what it can handle
and no more.

By contrast every time the server promises to do work that it can't handle things get worse.
GC gets worse, queue call times get worse, and it becomes a cycle. That continues until a
regionserver is in-operable. Removing threads that can call select leads to multiple seconds
where no tcp acks are sent. On loaded servers we saw all reader threads completely stop any
network selects at all.

bq.Selector.select immediately causes a context switch when an event occurs, and this patch
might make worse performance in such subtle heavy congestion.

Yes it does, and you want to get the reader threads back to the calling select as fast as
possible. That's the most basic tenant of an event loop. What was happening was that the threads
would stop for multiple seconds because the queues were full. That meant the event loop is
stopped.

> Don't block on Reader threads queueing to a scheduler queue
> -----------------------------------------------------------
>
>                 Key: HBASE-15146
>                 URL: https://issues.apache.org/jira/browse/HBASE-15146
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Elliott Clark
>            Assignee: Elliott Clark
>            Priority: Blocker
>             Fix For: 2.0.0, 1.2.0, 1.3.0
>
>         Attachments: HBASE-15146-v7.patch, HBASE-15146-v8.patch, HBASE-15146-v8.patch,
HBASE-15146.0.patch, HBASE-15146.1.patch, HBASE-15146.2.patch, HBASE-15146.3.patch, HBASE-15146.4.patch,
HBASE-15146.5.patch, HBASE-15146.6.patch
>
>
> Blocking on the epoll thread is awful. The new rpc scheduler can have lots of different
queues. Those queues have different capacity limits. Currently the dispatch method can block
trying to add the the blocking queue in any of the schedulers.
> This causes readers to block, tcp acks are delayed, and everything slows down.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message