hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Duo Zhang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16583) Staged Event-Driven Architecture
Date Fri, 09 Sep 2016 00:52:22 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15475492#comment-15475492

Duo Zhang commented on HBASE-16583:

I'd say TPC is the dream but SEDA is the reality :)

For C*, the first problem is how to deal with disk io. There is no AIO Filesystem implementation
in Java, and after a learning of ScyllaDB, one of the TPC implementation in real world, I
found that only zfs has a good support of AIO... So they still need a thread pool for disk
io, that is still SEDA:)

And we have the same problem...Although most of our IO is network based, we have short circuit

And TPC will make the code very very flaky, a simple sleep or some other time consuming operation
can kill all the server. For ScyllaDB, they have a really powerful framework to write TPC
code, for example, a sleep in that framework does not hang the current thread, it equals to
schedule a delayed task. And more, for some time consuming work, such as filtering or compaction,
we need to cut timeslice and do scheduling by ourselves. The guys of ScyllaDB used to write
KVM if I do not remember wrong, so maybe this is not a difficult mission for them. But for
us, with Java, I will not say it is impossible, but...

Will add more comments later. I need to go to work now...


> Staged Event-Driven Architecture
> --------------------------------
>                 Key: HBASE-16583
>                 URL: https://issues.apache.org/jira/browse/HBASE-16583
>             Project: HBase
>          Issue Type: Umbrella
>            Reporter: Phil Yang
> Staged Event-Driven Architecture (SEDA) splits request-handling logic into several stages,
each stage is executed in a thread pool and they are connected by queues.
> Currently, in region server we use a thread pool to handle requests from client. The
number of handlers is configurable, reading and writing use different pools. The current architecture
has two limitations:
> Performance:
> Different part of the handling path has different bottleneck. For example, accessing
MemStore and cache mainly consumes CPU but accessing HDFS mainly consumes network/disk IO.
If we use SEDA and split them into two different stages, we can use different numbers for
two pools according to the CPU/disk/network performance case by case.
> Availability:
> HBASE-16388 described a scene that if the client use a thread pool and use blocking methods
to access region servers, only one slow server may exhaust most of threads of the client.
For HBase, we are the client and HDFS datanodes are the servers. A slow datanode may exhaust
most of handlers. The best way to resolve this issue is make HDFS requests non-blocking, which
is exactly what SEDA does.

This message was sent by Atlassian JIRA

View raw message