hbase-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Appy (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (HBASE-19457) Debugging flaky TestTruncateTableProcedure#testRecoveryAndDoubleExecutionPreserveSplits
Date Fri, 15 Dec 2017 22:44:00 GMT

    [ https://issues.apache.org/jira/browse/HBASE-19457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16293380#comment-16293380
] 

Appy edited comment on HBASE-19457 at 12/15/17 10:43 PM:
---------------------------------------------------------

We discussed few things, here's the summary:
- we have procs spawing subprocs, but not sure if there's an example where this tree's depth
> 2. If yes, we can change truncate proc to just delete proc + create proc.

bq. As a step in truncate before we create the new? Wonder why this needs it and CreateTable
doesnt (I think you ask this above).
Both have ADD_TO_META step where they add regions to meta. But when we fail after that:
in case of truncate proc, there's a table row in meta with state null --> gets assumed
as enabled --> AM starts interfering
in case of create proc, there's no table row at all --> AM ignores those new regions

New stuff:
Stack recently committed HBASE-18946 which fixes issues around balancer and assigning. After
it went in, we see more greens for TestTruncateTableProcedure in flaky dashboard.
A word on that:
When AM interfered on recovery (see "...recovery: TableStateManager treats table with null
state as ENABLED. AM treats regions with null state as offline. Combined result - AM starts
assigning the new " in description), it started Assign procs. But they got stuck for some
reason (which i didn't care to debug as part of this test fix since it's unrelated). His patch
makes that case better.
But the real fix here should be to correctly handle state in TTP so that AM doesn't interfere.

We'll keep an eye on dashboard, see the new failures, and then decide verdict on this jira's
patch.

In meantime opened this new jira to discuss other questions HBASE-19529, HBASE-19530


was (Author: appy):
We discussed few things, here's the summary:
- we have procs spawing subprocs, but not sure if there's an example where this tree's depth
> 2. If yes, we can change truncate proc to just delete proc + create proc.

bq. As a step in truncate before we create the new? Wonder why this needs it and CreateTable
doesnt (I think you ask this above).
Both have ADD_TO_META step where they add regions to meta. But when we fail after that:
in case of truncate proc, there's a table row in meta with state null --> gets assumed
as enabled --> AM starts interfering
in case of create proc, there's no table row at all --> AM ignores those new regions

New stuff:
Stack recently committed HBASE-18946 which fixes issues around balancer and assigning. After
it went in, we see more greens for TestTruncateTableProcedure in flaky dashboard.
A word on that:
When AM interfered on recovery (see "...recovery: TableStateManager treats table with null
state as ENABLED. AM treats regions with null state as offline. Combined result - AM starts
assigning the new " in description), it started Assign procs. But they got stuck for some
reason (which i didn't care to debug as part of this test fix since it's unrelated). His patch
makes that case better.
But the real fix here should be to correctly handle state in TTP so that AM doesn't interfere.

We'll keep an eye on dashboard, see the new failures, and then decide verdict on this patch.

In meantime opened this new jira to discuss other questions HBASE-19529, HBASE-19530

> Debugging flaky TestTruncateTableProcedure#testRecoveryAndDoubleExecutionPreserveSplits
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-19457
>                 URL: https://issues.apache.org/jira/browse/HBASE-19457
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Appy
>            Assignee: Appy
>         Attachments: HBASE-19457.master.001.patch, patch1, test-output.txt
>
>
> Trying to explain the bug in a more general way where understanding of ProcedureV2 is
not required.
> Truncating table operation:
> ....
> delete region states from meta
> delete table state from meta
> ....
> add new regions to meta with state null.
> ....crash
> ....recovery: TableStateManager treats table with null state as ENABLED. AM treats regions
with null state as offline. Combined result - AM starts assigning the new regions from incomplete
truncate operation.
> Fix: Mark table as disabled instead of deleting it's state.
> ----
> *patch1*
> Just added some logging to help with debugging:
> - 60s was too less time, increased timeout
> - Added some useful log statements



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message