hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Enis Soztutar (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HBASE-16095) Add priority to TableDescriptor and priority region open thread pool
Date Tue, 19 Jul 2016 18:59:20 GMT

    [ https://issues.apache.org/jira/browse/HBASE-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384679#comment-15384679

Enis Soztutar commented on HBASE-16095:

Thanks Stack for taking a look. 
bq. can't we keep phoenix stuff up in phoenix? Secondary indices via transaction are almost
here. Isn't that the proper fix rather than adding new pools to hbase (we don't need more
pools), etc.
Unfortunately no. This happens in region open, so we need a mechanism to inject / configure
region opening, nothing related to RPC scheduling. 
bq. Why we need this change if configuring below could address deadlock?
That is deadlock on RPC's and regular index writes. This particular issue is about the writes
happening to the index region when we are opening the data region. The secondary index recovery
mechanism depends on the index region(s) being online. The writes are happening in a blocking
manner, so we block the actual region opener thread. Since the same region opener threads
are used to open both data and index regions deadlock happens. 
bq. This sort of dependence amongst regions – i.e. the index has to be online before data
region can come on line – is not supported in hbase; what happens if server carrying index
region crashes... and other scenarios, etc. Has it been worked through? If so, where can I
read about it?
I am not sure where you can read more. There were presentations online, but the implementation
in P is some years old with some changes.
bq. We have a mechanism for onlining important regions already that has loads of holes in
it (meta, namespace, etc.). The new AMv2 will go a long ways toward plugging a bunch of them.
In this issue we are proposing a new means of doing a similar thing but on an even shakier
Not quite the same thing. AM / Master can prioritize the opening of regions, but we cannot
control all the timing from a master perspective. We cannot time new tables being created
while servers going down and WAL recovery happening, etc. So there will never be perfect-and-strict
ordering that can be done from a master perspective if for example we want to ensure index
table regions are assigned first before the data table regions from AM. AM can do a best effort
job. On the other hand though, region servers do not need to order the incoming region open
requests. If there is no dependency then, having a fixed thread pool to open regions works.
If there is dependency, then it does not. 

bq. Seems dodgy Enis Soztutar, brittle as Gary Helmling says.
See my comment at https://issues.apache.org/jira/browse/HBASE-16095?focusedCommentId=15347538&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15347538.
Transactions is an optional concept in Phoenix, and it is still not GA. Even if it was, not
all use cases need transactions. We should still support secondary indexes without transactions
in Phoenix for some time. I agree that the mutable index architecture as is today should be
redesigned to remove the inter-region dependency and blocking the handlers. Working on a proposal
to do this using replication, but getting that fully working will take some time. Until then,
we have real users and customers running with the current stuff that needs the fix. 

bq. Phoenix users will have to ensure they configure all index tables as PRIORITY (making
index tables 'high priority' is a little unexpected)? For preexisting tables they'll have
to go through and enable this everywhere? 
I should have linked the Phoenix issue. My b. PHOENIX-3072 is the fix in Phoenix that would
automatically configure the priorities in Phoenix.

BTW, I think that the priority definition in the table descriptor also serves another purpose.
We can use that in RPC scheduling itself, so that should be useful in itself regardless of
P. Moreover, I was thinking that although HBase "does not support" region interdependencies,
we still have important tables with dependencies for most of the frameworks, like commit table
in omid, catalog/stats table in Phoenix as well as hbase-level system tables that uses this.

> Add priority to TableDescriptor and priority region open thread pool
> --------------------------------------------------------------------
>                 Key: HBASE-16095
>                 URL: https://issues.apache.org/jira/browse/HBASE-16095
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Enis Soztutar
>            Assignee: Enis Soztutar
>             Fix For: 2.0.0, 1.3.0, 1.4.0, 0.98.21
>         Attachments: HBASE-16095-0.98.patch, HBASE-16095-0.98.patch, hbase-16095_v0.patch,
hbase-16095_v1.patch, hbase-16095_v2.patch, hbase-16095_v3.patch
> This is in the similar area with HBASE-15816, and also required with the current secondary
indexing for Phoenix. 
> The problem with P secondary indexes is that data table regions depend on index regions
to be able to make progress. Possible distributed deadlocks can be prevented via custom RpcScheduler
+ RpcController configuration via HBASE-11048 and PHOENIX-938. However, region opening also
has the same deadlock situation, because data region open has to replay the WAL edits to the
index regions. There is only 1 thread pool to open regions with 3 workers by default. So if
the cluster is recovering / restarting from scratch, the deadlock happens because some index
regions cannot be opened due to them being in the same queue waiting for data regions to open
(which waits for  RPC'ing to index regions which is not open). This is reproduced in almost
all Phoenix secondary index clusters (mutable table w/o transactions) that we see. 
> The proposal is to have a "high priority" region opening thread pool, and have the HTD
carry the relative priority of a table. This maybe useful for other "framework" level tables
from Phoenix, Tephra, Trafodian, etc if they want some specific tables to become online faster.

> As a follow up patch, we can also take a look at how this priority information can be
used by the rpc scheduler on the server side or rpc controller on the client side, so that
we do not have to set priorities manually per-operation. 

This message was sent by Atlassian JIRA

View raw message